Thanks Micah!
Agreed, thanks for the review!
Since this is a large proposal, we should be able to land FSST before
landing composite encoding. Delta + RLE would be a good initial milestone
(without a dependency on FSST) and newer encodings can be added in the
future.
The proposed design for composite encodings makes it simple to add newer
encodings, once the right plumbing is baked in. With stage level
encoding/decoding, adding a new encoding is a matter of adding a few lines
of code in the validator and wiring up to the actual encoding
implementation. Newer encodings will still provide logic for the
non-composite version, and optional code to support it as composite
dependency in the composite encoder/decoder.
Would love to discuss more in sync.

Regards,
Arnav

On Tue, Dec 9, 2025 at 11:14 PM Micah Kornfield <[email protected]>
wrote:

> I think cascaded encodings would be a good idea in the long run.  I worry a
> little bit that there are dependencies on in-flight encoding proposals a
> little and it would be nice to focus on landing those before moving to
> something more complex.
>
> On Mon, Dec 8, 2025 at 11:31 PM Arnav Balyan <[email protected]>
> wrote:
>
> > Hi Antoine,
> > Thanks for the review, I'll add this data shortly.
> >
> > On Mon, Dec 8, 2025 at 4:18 PM Antoine Pitrou <[email protected]>
> wrote:
> >
> > >
> > > Hello Arnav,
> > >
> > > Was any additional compression applied? I could not find any
> > > information in the document.
> > >
> > > Ideally, for numerical columns I think the following configurations
> > > should be compared:
> > >
> > > - PLAIN
> > > - PLAIN + ZSTD
> > > - BYTE_STREAM_SPLIT + ZSTD
> > > - DELTA + RLE
> > > - DELTA + ZSTD
> > >
> > > For strings you might want to compare the following:
> > >
> > > - PLAIN
> > > - PLAIN + ZSTD
> > > - DELTA_BYTE_ARRAY
> > > - DELTA_BYTE_ARRAY + ZSTD
> > > - DICT
> > > - DICT + FSST
> > > - DICT + ZSTD
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> > >
> > > On Mon, 8 Dec 2025 15:14:20 +0530
> > > Arnav Balyan <[email protected]>
> > > wrote:
> > > > Hi team, thanks to the very valuable reviews and feedback from
> Juliean,
> > > > Micah, Adnrew and others, the FSST proposal is in the PoC stage, and
> > will
> > > > be worked upon in the coming weeks.
> > > >
> > > > I just wanted to start a discussion on Composite encodings for
> Parquet
> > > and
> > > > get the community's thoughts, feedback and suggestions on nested
> > > encodings.
> > > >
> > > > Nested/Composite/Hierarchical encodings are supported in Vortex,
> > > Fastlanes
> > > > etc, and partly supported in Parquet (with Dict + RLE). This
> > > > proposal discusses formalizing the same and paving way for future
> > > encodings
> > > > like Dict + FSST, Delta + RLE and others.
> > > >
> > > > Several benchmarks were run on some well recognized nested encodings,
> > and
> > > > show significant compression gains (order of 10x improvements) which
> > are
> > > > further detailed in the doc.
> > > >
> > > > Would love to get your thoughts and feedback!
> > > >
> > >
> >
> https://docs.google.com/document/d/1Yi5JwpKEsRFw7D8-iETguRDPtjlyiKITCguYUrrzEVY
> > > >
> > > > Regards,
> > > >  - Arnav
> > > >
> > >
> > >
> > >
> > >
> >
>

Reply via email to