Hi Yu, After discussing with @JingSongLee, we aim to enhance the sort lookup store in Paimon to support the new key-value file format. I have updated the PIP [1] and initiated a new discussion thread. Thank you.
[1] https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table Best, FangYong On Thu, Jul 18, 2024 at 9:05 PM Yu Li <[email protected]> wrote: > I'm curious about which part of the current design is the same with HBase's > HFile and which are different, and would suggest adding some description in > the document [1]. > > nit: I notice there is a link to hbase book [2] at the bottom but didn't > find any reference in the PIP doc. > > Best Regards, > Yu > > [1] > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table > [2] https://hbase.apache.org/book.html#_hfile_format_2 > > > On Wed, 17 Jul 2024 at 10:42, Jingsong Li <[email protected]> wrote: > > > Thanks Yong for driving this PIP! > > > > This PIP looks very nice! > > > > I have two considerations: > > > > 1. I am currently working on introducing a SortLookupStoreFactory. My > > idea is to first practice using local lookup to clarify what file > > format we need. Only when we feel that the format is mature (the > > performance is fully OK), then we can determine the specific structure > > of the format. > > > > 2. This format may not be called HFile. If it is different from HFile, > > we can give it another name. > > > > What do you think? > > > > Best, > > Jingsong > > > > On Wed, Jul 17, 2024 at 9:48 AM Yong Fang <[email protected]> wrote: > > > > > > Hi devs, > > > > > > I and LiMing would like to initiate a discussion on PIP-25: Introduce > > HFile > > > format for Paimon primary key table [1]. Currently, when Paimon > requires > > > creating lookup tables for lookup joins in streaming processes, it > reads > > > data from ORC/Parquet/Avro format files in HDFS/S3, converts records to > > > key-value format data, and writes them to disk. This process consumes a > > > substantial amount of time. > > > > > > We aim to introduce the hfile format into Paimon in order to reduce the > > > cost of creating lookup tables. Users can take advantage of this file > > > format for Paimon primary key tables when using Paimon as a lookup > table. > > > In this case, Paimon will create lookup tables based on hfile files > > without > > > rebuilding key-value files. > > > > > > Looking forward to your feedback, thanks. > > > > > > [1] > > > > > > https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+HFile+format+for+paimon+primary+key+table > > > > > > Best, > > > Fang Yong > > >
