Just wanted to follow-up. I did a first pass review on the
flatbuf definitions.

Cheers,
Micah

On Thu, Dec 11, 2025 at 11:58 PM Alkis Evlogimenos via dev <
[email protected]> wrote:

> PR for linking proposal here:
> https://github.com/apache/parquet-format/pull/543
> PR for parquet footer flatbuf definition:
> https://github.com/apache/parquet-format/pull/544
>
> On Tue, Dec 9, 2025 at 1:26 AM Julien Le Dem <[email protected]> wrote:
>
> > Hello Alkis,
> > Do you think you could add your footer proposal to the proposals page?
> >
> >
> >
> https://github.com/apache/parquet-format/tree/master/proposals#active-proposals
> > That way it gets more visibility.
> > Cheers
> > Julien
> >
> > On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran
> > <[email protected]>
> > wrote:
> >
> > > On Mon, 20 Oct 2025 at 18:24, Ed Seidl <[email protected]> wrote:
> > >
> > > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so of
> > the
> > > > file and look for a known UUID along with size information. With this
> > it
> > > > could then read only the flatbuffer bytes. I think this would work as
> > > well
> > > > as current systems that prefetch some number of bytes in an attempt
> to
> > > get
> > > > the whole footer in a single get.
> > > >
> > > > Old readers, however, will have to fetch both footers, but won't have
> > any
> > > > additional decoding work because the new footer is a binary field
> that
> > > can
> > > > be easily skipped.
> > > >
> > >
> > > really depends what the readers do with footer prefetching. For the
> java
> > > clients
> > >
> > >
> > >    1. s3a classic stream: the backwards seek()  switches it to random
> IO
> > >    mode, next read() from base of thrift will pull in
> > > fs.s3a.readahead.range
> > >    of data  No penalty
> > >    2. google gs://. There's a footer cache option which will need to be
> > set
> > >    to a larger value
> > >    3. azure abfs:// there's a footer cache option which will need to be
> > set
> > >    to a larger value
> > >    4. s3a + amazon analytics stream. This stream is *parquet aware* and
> > >    actually parses the footer to know what to predictively prefetch.
> The
> > > AWS
> > >    developers do know of this work -moving to support the new footer
> > would
> > > be
> > >    the ideal strategy here.
> > >    5. Iceberg classic input. no idea.
> > >    6. iceberg + amazon analytics. same as S3A though without some of
> the
> > >    tuning we've been doing for vector reads.
> > >
> > > I wouldn't worry too much about the impact of that footer size
> increase.
> > > Some extra footer prefetch options should compensate, and once apps
> move
> > to
> > > a parquet v3 reader they've got a faster parse time. Of course,
> > ironically,
> > > read time then may dominate even more there -it'll be important to do
> > that
> > > read as efficiently as possible (use a readFully() into a buffer, not
> > lots
> > > of single byte read() calls)
> > >
> >
>

Reply via email to