Hello Arnav,
Was any additional compression applied? I could not find any information in the document. Ideally, for numerical columns I think the following configurations should be compared: - PLAIN - PLAIN + ZSTD - BYTE_STREAM_SPLIT + ZSTD - DELTA + RLE - DELTA + ZSTD For strings you might want to compare the following: - PLAIN - PLAIN + ZSTD - DELTA_BYTE_ARRAY - DELTA_BYTE_ARRAY + ZSTD - DICT - DICT + FSST - DICT + ZSTD Regards Antoine. On Mon, 8 Dec 2025 15:14:20 +0530 Arnav Balyan <[email protected]> wrote: > Hi team, thanks to the very valuable reviews and feedback from Juliean, > Micah, Adnrew and others, the FSST proposal is in the PoC stage, and will > be worked upon in the coming weeks. > > I just wanted to start a discussion on Composite encodings for Parquet and > get the community's thoughts, feedback and suggestions on nested encodings. > > Nested/Composite/Hierarchical encodings are supported in Vortex, Fastlanes > etc, and partly supported in Parquet (with Dict + RLE). This > proposal discusses formalizing the same and paving way for future encodings > like Dict + FSST, Delta + RLE and others. > > Several benchmarks were run on some well recognized nested encodings, and > show significant compression gains (order of 10x improvements) which are > further detailed in the doc. > > Would love to get your thoughts and feedback! > https://docs.google.com/document/d/1Yi5JwpKEsRFw7D8-iETguRDPtjlyiKITCguYUrrzEVY > > Regards, > - Arnav >
