Hi Yu,

After discussing with @JingSongLee, we aim to enhance the sort lookup store
in Paimon to support the new key-value file format. I have updated the PIP
[1] and initiated a new discussion thread. Thank you.

[1]
https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table

Best,
FangYong

On Thu, Jul 18, 2024 at 9:05 PM Yu Li <[email protected]> wrote:

> I'm curious about which part of the current design is the same with HBase's
> HFile and which are different, and would suggest adding some description in
> the document [1].
>
> nit: I notice there is a link to hbase book [2] at the bottom but didn't
> find any reference in the PIP doc.
>
> Best Regards,
> Yu
>
> [1]
>
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+a+key-value+file+format+for+paimon+primary+key+table
> [2] https://hbase.apache.org/book.html#_hfile_format_2
>
>
> On Wed, 17 Jul 2024 at 10:42, Jingsong Li <[email protected]> wrote:
>
> > Thanks Yong for driving this PIP!
> >
> > This PIP looks very nice!
> >
> > I have two considerations:
> >
> > 1. I am currently working on introducing a SortLookupStoreFactory. My
> > idea is to first practice using local lookup to clarify what file
> > format we need. Only when we feel that the format is mature (the
> > performance is fully OK), then we can determine the specific structure
> > of the format.
> >
> > 2. This format may not be called HFile. If it is different from HFile,
> > we can give it another name.
> >
> > What do you think?
> >
> > Best,
> > Jingsong
> >
> > On Wed, Jul 17, 2024 at 9:48 AM Yong Fang <[email protected]> wrote:
> > >
> > > Hi devs,
> > >
> > > I and LiMing would like to initiate a discussion on PIP-25: Introduce
> > HFile
> > > format for Paimon primary key table [1]. Currently, when Paimon
> requires
> > > creating lookup tables for lookup joins in streaming processes, it
> reads
> > > data from ORC/Parquet/Avro format files in HDFS/S3, converts records to
> > > key-value format data, and writes them to disk. This process consumes a
> > > substantial amount of time.
> > >
> > > We aim to introduce the hfile format into Paimon in order to reduce the
> > > cost of creating lookup tables. Users can take advantage of this file
> > > format for Paimon primary key tables when using Paimon as a lookup
> table.
> > > In this case, Paimon will create lookup tables based on hfile files
> > without
> > > rebuilding key-value files.
> > >
> > > Looking forward to your feedback, thanks.
> > >
> > > [1]
> > >
> >
> https://cwiki.apache.org/confluence/display/PAIMON/PIP-25%3A+Introduce+HFile+format+for+paimon+primary+key+table
> > >
> > > Best,
> > > Fang Yong
> >
>

Reply via email to