My reading of RFC8878 is that multiple frame support is the standard.

I would be nice for the Parquet compression spec to confirm this, as it
already calls out GZIP.

(The story is that we've been using multi-frame ZSTD and it's all fine.
However, a recent AI reviewbot started flagging it as non-standard, citing
the Parquet docs and making very convincing arguments that other compliant
implementations might not support it :) )

On Fri, 6 Mar 2026 at 09:18, Antoine Pitrou <[email protected]> wrote:

>
> There may be multiple ZSTD implementations out there (including perhaps
> closed source), especially as it's now a IETF standard:
> https://datatracker.ietf.org/doc/html/rfc8878
>
> A bit of research would be necessary to find out whether other ZSTD
> implementations similarly decompress multi-frame bodies transparently.
>
> Regards
>
> Antoine.
>
>
> Le 06/03/2026 à 01:49, Andrew Pilloud via dev a écrit :
> > Hi, another long time lurker here.
> >
> > +1 to this. I've been toying with ways to do partial page decoding.
> Things
> > like reading only def levels in v1 pages, doing selective reads in plain
> > pages, or partially decompressing large pages due to memory pressure.
> > Writing files with multi-frame Ztd would make that more efficient, but
> > there are definitely concerns around reader compatibility.
> >
> > Andrew
> >
> > On Thu, Mar 5, 2026 at 1:34 AM Will Edwards via dev <
> [email protected]>
> > wrote:
> >
> >> howdy folks, nice to e-meet you all :D. I am a long time lurker.  Love
> >> Parquet :D
> >>
> >> Currently, the compression specs for parquet address multi-frame GZIP:
> >>
> >> "Readers should support reading pages containing multiple GZIP members,
> >> however, as this has historically not been supported by all
> >> implementations, it is recommended that writers refrain from creating
> such
> >> pages by default for better interoperability."
> >>
> >> https://github.com/apache/parquet-format/blob/master/Compression.md
> >>
> >> However, there is no corresponding mention of a ZSTD page containing
> >> concatenated ZSTD frames.
> >>
> >> I am not aware of any Parquet readers that do not support this.  The
> go-to
> >> decompress function in the mainstream ZSTD library, which everyone
> surely
> >> uses, transparently supports multi-frame data.
> >>
> >> However, the reference library includes a function for decoding only a
> >> single frame, and readers I am not aware of might use it.
> >>
> >> Can we add a note to the compression spec to explicitly bless
> multi-frame
> >> ZSTD too, to avoid any future confusion?
> >>
> >> Yours hopefully,
> >>
> >> Will
> >>
> >
>
>
>

Reply via email to