Hello Arnav,

Was any additional compression applied? I could not find any
information in the document.

Ideally, for numerical columns I think the following configurations
should be compared:

- PLAIN
- PLAIN + ZSTD
- BYTE_STREAM_SPLIT + ZSTD
- DELTA + RLE
- DELTA + ZSTD

For strings you might want to compare the following:

- PLAIN
- PLAIN + ZSTD
- DELTA_BYTE_ARRAY
- DELTA_BYTE_ARRAY + ZSTD
- DICT
- DICT + FSST
- DICT + ZSTD

Regards

Antoine.


On Mon, 8 Dec 2025 15:14:20 +0530
Arnav Balyan <[email protected]>
wrote:
> Hi team, thanks to the very valuable reviews and feedback from Juliean,
> Micah, Adnrew and others, the FSST proposal is in the PoC stage, and will
> be worked upon in the coming weeks.
> 
> I just wanted to start a discussion on Composite encodings for Parquet and
> get the community's thoughts, feedback and suggestions on nested encodings.
> 
> Nested/Composite/Hierarchical encodings are supported in Vortex, Fastlanes
> etc, and partly supported in Parquet (with Dict + RLE). This
> proposal discusses formalizing the same and paving way for future encodings
> like Dict + FSST, Delta + RLE and others.
> 
> Several benchmarks were run on some well recognized nested encodings, and
> show significant compression gains (order of 10x improvements) which are
> further detailed in the doc.
> 
> Would love to get your thoughts and feedback!
> https://docs.google.com/document/d/1Yi5JwpKEsRFw7D8-iETguRDPtjlyiKITCguYUrrzEVY
> 
> Regards,
>  - Arnav
> 



Reply via email to