>
> +1 on looking in openzl more deeply *before* we add new encodings.


I think the compatibility guarantees for the project are currently not
sufficient for use in parquet due to lack of guaranteed compatibility [1],
some of the ideas might be interesting to look at an adopt in the meantime:

"However, we intend to maintain some stability guarantees in the face of
that evolution. In particular, payloads compressed with any release-tagged
version of the library will remain decompressible by new releases of the
library for at least the next several years. And new releases of the
library will be able to generate frames compatible with at least the
previous release."


[1] https://github.com/facebook/openzl

On Tue, Oct 14, 2025 at 6:58 AM Alkis Evlogimenos
<[email protected]> wrote:

> +1 on looking in openzl more deeply *before* we add new encodings.
>
> What's very attractive about openzl is that the decoder is fixed and
> advancements in encoding are backwards/forwards compatible. This means less
> changes to the format itself. The ideal end state would be to add openzl to
> parquet and encode everything as PLAIN.
>
> One thing to investigate is if we can get openzl compressed data at some
> point in the graph and then perform compressed execution on them. This
> would be perfect for dictionary encoded streams.
>
> On Tue, Oct 7, 2025 at 4:34 PM Krisztián Szűcs <[email protected]>
> wrote:
>
> > Hi,
> >
> > There seems to be a new (if I’m not mistaken it was published yesterday)
> > codec/compression framework called OpenZL [1][2][3]. I haven’t looked at
> > it
> > thoroughly yet, but it somewhat reminds me of BtrBlocks.
> > Even if we don’t consider more advanced features of a framework like
> this,
> > we could offload the various codec implementations to another project.
> >
> > Krisztian
> >
> > [1]: https://openzl.org/
> > [2]: https://github.com/facebook/openzl/tree/dev/src/openzl/codecs
> > [3]:
> >
> https://engineering.fb.com/2025/10/06/developer-tools/openzl-open-source-format-aware-compression-framework/
> >
> > > On 2025. Oct 1., at 20:11, Andrew Lamb <[email protected]> wrote:
> > >
> > > I would like to start a discussion to help organize and rally anyone
> > > interested in adding new encodings to Parquet.
> > >
> > > I am pretty sure there are many people interested in adding new
> > encodings,
> > > but there are only a few mentions on the mailing list, such as pcode
> [1]
> > > and FSST/ALP/FastLanes [2]. Prateek mentioned on the sync call today
> > > that he is working on evaluating some potential encodings and hopes to
> > have
> > > some information to share soon, and Julien mentioned he had spoken to
> > > someone else who might be doing something similar.
> > >
> > > Now that Julien has defined a process to extend the spec[3] I think the
> > > steps are much clearer.
> > >
> > > So, I would like to invite anyone interested in adding new encodings to
> > > respond and let us know if you are willing to help evaluate new
> encodings
> > > and prototype integrations into Parquet implementations?
> > >
> > > Andrew
> > >
> > >
> > > [1]: https://lists.apache.org/thread/bdmfcj4g6y1ccd3mfgrp7d43d73s6zf6
> > > [2]: https://lists.apache.org/thread/s3o9jk0hr942pv6ono4ymnvvj6pfdsdw
> > > [3]:
> > >
> https://github.com/apache/parquet-format/blob/master/proposals/README.md
> >
> >
>

Reply via email to