Re: [DISCUSS] flatbuf footer

Alkis Evlogimenos via dev Thu, 11 Dec 2025 23:59:03 -0800

PR for linking proposal here:
https://github.com/apache/parquet-format/pull/543
PR for parquet footer flatbuf definition:
https://github.com/apache/parquet-format/pull/544


On Tue, Dec 9, 2025 at 1:26 AM Julien Le Dem <[email protected]> wrote:

> Hello Alkis,
> Do you think you could add your footer proposal to the proposals page?
>
>
> https://github.com/apache/parquet-format/tree/master/proposals#active-proposals
> That way it gets more visibility.
> Cheers
> Julien
>
> On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran
> <[email protected]>
> wrote:
>
> > On Mon, 20 Oct 2025 at 18:24, Ed Seidl <[email protected]> wrote:
> >
> > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so of
> the
> > > file and look for a known UUID along with size information. With this
> it
> > > could then read only the flatbuffer bytes. I think this would work as
> > well
> > > as current systems that prefetch some number of bytes in an attempt to
> > get
> > > the whole footer in a single get.
> > >
> > > Old readers, however, will have to fetch both footers, but won't have
> any
> > > additional decoding work because the new footer is a binary field that
> > can
> > > be easily skipped.
> > >
> >
> > really depends what the readers do with footer prefetching. For the java
> > clients
> >
> >
> >    1. s3a classic stream: the backwards seek()  switches it to random IO
> >    mode, next read() from base of thrift will pull in
> > fs.s3a.readahead.range
> >    of data  No penalty
> >    2. google gs://. There's a footer cache option which will need to be
> set
> >    to a larger value
> >    3. azure abfs:// there's a footer cache option which will need to be
> set
> >    to a larger value
> >    4. s3a + amazon analytics stream. This stream is *parquet aware* and
> >    actually parses the footer to know what to predictively prefetch. The
> > AWS
> >    developers do know of this work -moving to support the new footer
> would
> > be
> >    the ideal strategy here.
> >    5. Iceberg classic input. no idea.
> >    6. iceberg + amazon analytics. same as S3A though without some of the
> >    tuning we've been doing for vector reads.
> >
> > I wouldn't worry too much about the impact of that footer size increase.
> > Some extra footer prefetch options should compensate, and once apps move
> to
> > a parquet v3 reader they've got a faster parse time. Of course,
> ironically,
> > read time then may dominate even more there -it'll be important to do
> that
> > read as efficiently as possible (use a readFully() into a buffer, not
> lots
> > of single byte read() calls)
> >
>

Re: [DISCUSS] flatbuf footer

Reply via email to