Hi Antoine, Thanks for the review, I'll add this data shortly. On Mon, Dec 8, 2025 at 4:18 PM Antoine Pitrou <[email protected]> wrote:
> > Hello Arnav, > > Was any additional compression applied? I could not find any > information in the document. > > Ideally, for numerical columns I think the following configurations > should be compared: > > - PLAIN > - PLAIN + ZSTD > - BYTE_STREAM_SPLIT + ZSTD > - DELTA + RLE > - DELTA + ZSTD > > For strings you might want to compare the following: > > - PLAIN > - PLAIN + ZSTD > - DELTA_BYTE_ARRAY > - DELTA_BYTE_ARRAY + ZSTD > - DICT > - DICT + FSST > - DICT + ZSTD > > Regards > > Antoine. > > > On Mon, 8 Dec 2025 15:14:20 +0530 > Arnav Balyan <[email protected]> > wrote: > > Hi team, thanks to the very valuable reviews and feedback from Juliean, > > Micah, Adnrew and others, the FSST proposal is in the PoC stage, and will > > be worked upon in the coming weeks. > > > > I just wanted to start a discussion on Composite encodings for Parquet > and > > get the community's thoughts, feedback and suggestions on nested > encodings. > > > > Nested/Composite/Hierarchical encodings are supported in Vortex, > Fastlanes > > etc, and partly supported in Parquet (with Dict + RLE). This > > proposal discusses formalizing the same and paving way for future > encodings > > like Dict + FSST, Delta + RLE and others. > > > > Several benchmarks were run on some well recognized nested encodings, and > > show significant compression gains (order of 10x improvements) which are > > further detailed in the doc. > > > > Would love to get your thoughts and feedback! > > > https://docs.google.com/document/d/1Yi5JwpKEsRFw7D8-iETguRDPtjlyiKITCguYUrrzEVY > > > > Regards, > > - Arnav > > > > > >
