Hi everyone,

Thanks for your valuable feedback!

Our discussions have been going on for a while.
As a sub-FLIP of FLIP-423 which is nearing a consensus, I would like to
start a vote after 72 hours.

Please let me know if you have any concerns, thanks!

On Mon, Mar 11, 2024 at 11:48 AM Hangxiang Yu <master...@gmail.com> wrote:

> Hi, Jeyhun.
>
> Thanks for the reply.
>
> Is this argument true for all workloads? Or does this argument also hold
> for workloads with many small files, which is quite a common case [1] ?
>
> Yes, I think so. The overhead should still be considered negligible,
> particularly in comparison to remote I/O, and other benefits of this
> proposal may be more significant than this one.
>
> Additionally, there is JNI overhead when Flink calls RocksDB methods
> currently. The frequency of these calls could surpass that of actual file
> system interface calls, given that not all state requests require accessing
> the file system.
>
> BTW, the issue with small files can also impact the performance of db with
> the local file system at runtime, so we usually resolve this firstly in the
> production environment.
>
> the engine spawns huge amount of scan range requests to the
> file system to retrieve different parts of a file.
>
> Indeed, frequent requests to the remote file system can significantly
> affect performance. To address this, other FLIPs have introduced various
> strategies:
>
> 1. Local disk cache to minimize remote requests as described in FLIP-423
> which we will introduce in FLIP-429 as you mentioned. With effective cache
> utilization, the performance will not be inferior to the local strategy
> when cache hits.
>
> 2. Grouping remote access to decrease the number of remote I/O requests,
> as proposed in "FLIP-426: Grouping Remote State Access."
>
> 3. Parallel I/O to maximize network bandwidth usage, outlined in
> "FLIP-425: Asynchronous Execution Model."
>
> The PoC implements a simple file cache and asynchronous execution which
> improves the performance a lot. You could also refer to the PoC results in
> FLIP-423.
>
> On Mon, Mar 11, 2024 at 3:11 AM Jeyhun Karimov <je.kari...@gmail.com>
> wrote:
>
>> Hi Hangxiang,
>>
>> Thanks for the proposal. +1 for it.
>> I have a few comments.
>>
>> Proposal 2 has additional JNI overhead, but the overhead is relatively
>> > negligible when weighed against the latency of remote I/O.
>>
>> - Is this argument true for all workloads? Or does this argument also hold
>> for workloads with many small files, which is quite a common case [1] ?
>>
>> - Also, in many workloads the engine does not need the whole file either
>> because of the query forces it or
>> file type supports efficient filtering (e.g. ORC, parquet, arrow files),
>> or
>> simply one file is "divided" among multiple workers.
>> In these cases, the engine spawns huge amount of scan range requests to
>> the
>> file system to retrieve different parts of a file.
>> How the proposed solution would work with these workloads?
>>
>> - The similar question related to the above applies also for caching ( I
>> know caching is subject of FLIP-429, asking here becasue of the related
>> section in this FLIP).
>>
>> Regards,
>> Jeyhun
>>
>> [1] https://blog.min.io/challenge-big-data-small-files/
>>
>>
>>
>> On Thu, Mar 7, 2024 at 10:09 AM Hangxiang Yu <master...@gmail.com> wrote:
>>
>> > Hi devs,
>> >
>> >
>> > I'd like to start a discussion on a sub-FLIP of FLIP-423: Disaggregated
>> > State Storage and Management[1], which is a joint work of Yuan Mei,
>> Zakelly
>> > Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang:
>> >
>> > - FLIP-427: Disaggregated State Store
>> >
>> > This FLIP introduces the initial version of the ForSt disaggregated
>> state
>> > store.
>> >
>> > Please make sure you have read the FLIP-423[1] to know the whole story,
>> and
>> > we'll discuss the details of FLIP-427[2] under this mail. For the
>> > discussion of overall architecture or topics related with multiple
>> > sub-FLIPs, please post in the previous mail[3].
>> >
>> > Looking forward to hearing from you!
>> >
>> > [1] https://cwiki.apache.org/confluence/x/R4p3EQ
>> >
>> > [2] https://cwiki.apache.org/confluence/x/T4p3EQ
>> >
>> > [3] https://lists.apache.org/thread/ct8smn6g9y0b8730z7rp9zfpnwmj8vf0
>> >
>> >
>> > Best,
>> >
>> > Hangxiang.
>> >
>>
>
>
> --
> Best,
> Hangxiang.
>


-- 
Best,
Hangxiang.

Reply via email to