Hi Antoine,
Thanks for the review, I'll add this data shortly.

On Mon, Dec 8, 2025 at 4:18 PM Antoine Pitrou <[email protected]> wrote:

>
> Hello Arnav,
>
> Was any additional compression applied? I could not find any
> information in the document.
>
> Ideally, for numerical columns I think the following configurations
> should be compared:
>
> - PLAIN
> - PLAIN + ZSTD
> - BYTE_STREAM_SPLIT + ZSTD
> - DELTA + RLE
> - DELTA + ZSTD
>
> For strings you might want to compare the following:
>
> - PLAIN
> - PLAIN + ZSTD
> - DELTA_BYTE_ARRAY
> - DELTA_BYTE_ARRAY + ZSTD
> - DICT
> - DICT + FSST
> - DICT + ZSTD
>
> Regards
>
> Antoine.
>
>
> On Mon, 8 Dec 2025 15:14:20 +0530
> Arnav Balyan <[email protected]>
> wrote:
> > Hi team, thanks to the very valuable reviews and feedback from Juliean,
> > Micah, Adnrew and others, the FSST proposal is in the PoC stage, and will
> > be worked upon in the coming weeks.
> >
> > I just wanted to start a discussion on Composite encodings for Parquet
> and
> > get the community's thoughts, feedback and suggestions on nested
> encodings.
> >
> > Nested/Composite/Hierarchical encodings are supported in Vortex,
> Fastlanes
> > etc, and partly supported in Parquet (with Dict + RLE). This
> > proposal discusses formalizing the same and paving way for future
> encodings
> > like Dict + FSST, Delta + RLE and others.
> >
> > Several benchmarks were run on some well recognized nested encodings, and
> > show significant compression gains (order of 10x improvements) which are
> > further detailed in the doc.
> >
> > Would love to get your thoughts and feedback!
> >
> https://docs.google.com/document/d/1Yi5JwpKEsRFw7D8-iETguRDPtjlyiKITCguYUrrzEVY
> >
> > Regards,
> >  - Arnav
> >
>
>
>
>

Reply via email to