Thanks Aitozi for initiating this discussion. I have some questions: (1) Why need this cache in the analysis senior? When scan a snapshot, why a dataFile will be read multiple times? (2) CachedSeekableInputStream and BlockCache, which implementation do you prefer to choose? (3) In BlockCache, why introduce a BlockQueue?
Best, wangwj On Tue, Jul 16, 2024 at 3:07 PM Aitozi <[email protected]> wrote: > > Hi, Fang Yong > > Thanks for your valuable comments. Here are some of my thoughts on your > question > > (1) The distributed cache and local file cache actually work in different > locations, and their functions are orthogonal. > Therefore, I believe that these two can be used together. So this proposal > mainly focus on the local cache > (2) In our design, the scheduler utilizes the consistent hash strategy to > assign DataSplits to computing nodes, > enabling cache colocation scheduling. > > Repost the doc on wiki page: > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-24+Introduce+data+cache+in+paimon+reader > > Thanks, > Aitozi. > > Yong Fang <[email protected]> 于2024年7月16日周二 14:37写道: > > > Thanks Aitozi for initiating this discussion. For the data cache, I have > > some questions: > > > > 1. In the design document, the focus is mainly on block cache. In a > > complete cache system, it is usually divided into distributed cache, local > > file cache, block cache, and key-value cache. Compared with block cache, > > would it be more effective to introduce a distributed cache such as > > Alluxio? > > > > 2. For the computing engine: What interfaces should Paimon's cache provide > > so that the computing engine can be aware of which computing nodes cache > > which data, and facilitate the deployment of computing tasks to the > > appropriate computing nodes at the scheduling layer? > > > > Best, > > FangYong > > > > On Tue, Jul 16, 2024 at 10:45 AM Aitozi <[email protected]> wrote: > > > > > Hi devs: > > > I want to initiate a discussion on the ability to support data cache > > in > > > the Paimon reader, aiming to accelerate the performance of scan operators > > > in analytical scenarios. The detailed design document is as follows [1]. > > > Looking forward to your feedback. > > > > > > > > > [1]: > > > > > > > > https://docs.google.com/document/d/1-zzDpxcubukMR-21n66OPv2ViKEFeEJ_Mivc-wW4gLM/edit?usp=sharing > > > > > > Thanks > > > Aitozi. > > > > >
