Hi, Jeyhun.

Thanks for the reply.

Is this argument true for all workloads? Or does this argument also hold
for workloads with many small files, which is quite a common case [1] ?

Yes, I think so. The overhead should still be considered negligible,
particularly in comparison to remote I/O, and other benefits of this
proposal may be more significant than this one.

Additionally, there is JNI overhead when Flink calls RocksDB methods
currently. The frequency of these calls could surpass that of actual file
system interface calls, given that not all state requests require accessing
the file system.

BTW, the issue with small files can also impact the performance of db with
the local file system at runtime, so we usually resolve this firstly in the
production environment.

the engine spawns huge amount of scan range requests to the
file system to retrieve different parts of a file.

Indeed, frequent requests to the remote file system can significantly
affect performance. To address this, other FLIPs have introduced various
strategies:

1. Local disk cache to minimize remote requests as described in FLIP-423
which we will introduce in FLIP-429 as you mentioned. With effective cache
utilization, the performance will not be inferior to the local strategy
when cache hits.

2. Grouping remote access to decrease the number of remote I/O requests, as
proposed in "FLIP-426: Grouping Remote State Access."

3. Parallel I/O to maximize network bandwidth usage, outlined in "FLIP-425:
Asynchronous Execution Model."

The PoC implements a simple file cache and asynchronous execution which
improves the performance a lot. You could also refer to the PoC results in
FLIP-423.

On Mon, Mar 11, 2024 at 3:11 AM Jeyhun Karimov <je.kari...@gmail.com> wrote:

> Hi Hangxiang,
>
> Thanks for the proposal. +1 for it.
> I have a few comments.
>
> Proposal 2 has additional JNI overhead, but the overhead is relatively
> > negligible when weighed against the latency of remote I/O.
>
> - Is this argument true for all workloads? Or does this argument also hold
> for workloads with many small files, which is quite a common case [1] ?
>
> - Also, in many workloads the engine does not need the whole file either
> because of the query forces it or
> file type supports efficient filtering (e.g. ORC, parquet, arrow files), or
> simply one file is "divided" among multiple workers.
> In these cases, the engine spawns huge amount of scan range requests to the
> file system to retrieve different parts of a file.
> How the proposed solution would work with these workloads?
>
> - The similar question related to the above applies also for caching ( I
> know caching is subject of FLIP-429, asking here becasue of the related
> section in this FLIP).
>
> Regards,
> Jeyhun
>
> [1] https://blog.min.io/challenge-big-data-small-files/
>
>
>
> On Thu, Mar 7, 2024 at 10:09 AM Hangxiang Yu <master...@gmail.com> wrote:
>
> > Hi devs,
> >
> >
> > I'd like to start a discussion on a sub-FLIP of FLIP-423: Disaggregated
> > State Storage and Management[1], which is a joint work of Yuan Mei,
> Zakelly
> > Lan, Jinzhong Li, Hangxiang Yu, Yanfei Lei and Feng Wang:
> >
> > - FLIP-427: Disaggregated State Store
> >
> > This FLIP introduces the initial version of the ForSt disaggregated state
> > store.
> >
> > Please make sure you have read the FLIP-423[1] to know the whole story,
> and
> > we'll discuss the details of FLIP-427[2] under this mail. For the
> > discussion of overall architecture or topics related with multiple
> > sub-FLIPs, please post in the previous mail[3].
> >
> > Looking forward to hearing from you!
> >
> > [1] https://cwiki.apache.org/confluence/x/R4p3EQ
> >
> > [2] https://cwiki.apache.org/confluence/x/T4p3EQ
> >
> > [3] https://lists.apache.org/thread/ct8smn6g9y0b8730z7rp9zfpnwmj8vf0
> >
> >
> > Best,
> >
> > Hangxiang.
> >
>


-- 
Best,
Hangxiang.

Reply via email to