My reading of RFC8878 is that multiple frame support is the standard. I would be nice for the Parquet compression spec to confirm this, as it already calls out GZIP.
(The story is that we've been using multi-frame ZSTD and it's all fine. However, a recent AI reviewbot started flagging it as non-standard, citing the Parquet docs and making very convincing arguments that other compliant implementations might not support it :) ) On Fri, 6 Mar 2026 at 09:18, Antoine Pitrou <[email protected]> wrote: > > There may be multiple ZSTD implementations out there (including perhaps > closed source), especially as it's now a IETF standard: > https://datatracker.ietf.org/doc/html/rfc8878 > > A bit of research would be necessary to find out whether other ZSTD > implementations similarly decompress multi-frame bodies transparently. > > Regards > > Antoine. > > > Le 06/03/2026 à 01:49, Andrew Pilloud via dev a écrit : > > Hi, another long time lurker here. > > > > +1 to this. I've been toying with ways to do partial page decoding. > Things > > like reading only def levels in v1 pages, doing selective reads in plain > > pages, or partially decompressing large pages due to memory pressure. > > Writing files with multi-frame Ztd would make that more efficient, but > > there are definitely concerns around reader compatibility. > > > > Andrew > > > > On Thu, Mar 5, 2026 at 1:34 AM Will Edwards via dev < > [email protected]> > > wrote: > > > >> howdy folks, nice to e-meet you all :D. I am a long time lurker. Love > >> Parquet :D > >> > >> Currently, the compression specs for parquet address multi-frame GZIP: > >> > >> "Readers should support reading pages containing multiple GZIP members, > >> however, as this has historically not been supported by all > >> implementations, it is recommended that writers refrain from creating > such > >> pages by default for better interoperability." > >> > >> https://github.com/apache/parquet-format/blob/master/Compression.md > >> > >> However, there is no corresponding mention of a ZSTD page containing > >> concatenated ZSTD frames. > >> > >> I am not aware of any Parquet readers that do not support this. The > go-to > >> decompress function in the mainstream ZSTD library, which everyone > surely > >> uses, transparently supports multi-frame data. > >> > >> However, the reference library includes a function for decoding only a > >> single frame, and readers I am not aware of might use it. > >> > >> Can we add a note to the compression spec to explicitly bless > multi-frame > >> ZSTD too, to avoid any future confusion? > >> > >> Yours hopefully, > >> > >> Will > >> > > > > >
