Re: [DISCUSS] flatbuf footer

Alkis Evlogimenos via dev Sun, 08 Feb 2026 22:17:51 -0800

Thank you Micah. Will follow up on the PR.

On Sun, Feb 8, 2026 at 8:31 PM Micah Kornfield <[email protected]>
wrote:


> Just wanted to follow-up. I did a first pass review on the
> flatbuf definitions.
>
> Cheers,
> Micah
>
> On Thu, Dec 11, 2025 at 11:58 PM Alkis Evlogimenos via dev <
> [email protected]> wrote:
>
>> PR for linking proposal here:
>> https://github.com/apache/parquet-format/pull/543
>> PR for parquet footer flatbuf definition:
>> https://github.com/apache/parquet-format/pull/544
>>
>> On Tue, Dec 9, 2025 at 1:26 AM Julien Le Dem <[email protected]> wrote:
>>
>> > Hello Alkis,
>> > Do you think you could add your footer proposal to the proposals page?
>> >
>> >
>> >
>> https://github.com/apache/parquet-format/tree/master/proposals#active-proposals
>> > That way it gets more visibility.
>> > Cheers
>> > Julien
>> >
>> > On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran
>> > <[email protected]>
>> > wrote:
>> >
>> > > On Mon, 20 Oct 2025 at 18:24, Ed Seidl <[email protected]> wrote:
>> > >
>> > > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so
>> of
>> > the
>> > > > file and look for a known UUID along with size information. With
>> this
>> > it
>> > > > could then read only the flatbuffer bytes. I think this would work
>> as
>> > > well
>> > > > as current systems that prefetch some number of bytes in an attempt
>> to
>> > > get
>> > > > the whole footer in a single get.
>> > > >
>> > > > Old readers, however, will have to fetch both footers, but won't
>> have
>> > any
>> > > > additional decoding work because the new footer is a binary field
>> that
>> > > can
>> > > > be easily skipped.
>> > > >
>> > >
>> > > really depends what the readers do with footer prefetching. For the
>> java
>> > > clients
>> > >
>> > >
>> > >    1. s3a classic stream: the backwards seek()  switches it to random
>> IO
>> > >    mode, next read() from base of thrift will pull in
>> > > fs.s3a.readahead.range
>> > >    of data  No penalty
>> > >    2. google gs://. There's a footer cache option which will need to
>> be
>> > set
>> > >    to a larger value
>> > >    3. azure abfs:// there's a footer cache option which will need to
>> be
>> > set
>> > >    to a larger value
>> > >    4. s3a + amazon analytics stream. This stream is *parquet aware*
>> and
>> > >    actually parses the footer to know what to predictively prefetch.
>> The
>> > > AWS
>> > >    developers do know of this work -moving to support the new footer
>> > would
>> > > be
>> > >    the ideal strategy here.
>> > >    5. Iceberg classic input. no idea.
>> > >    6. iceberg + amazon analytics. same as S3A though without some of
>> the
>> > >    tuning we've been doing for vector reads.
>> > >
>> > > I wouldn't worry too much about the impact of that footer size
>> increase.
>> > > Some extra footer prefetch options should compensate, and once apps
>> move
>> > to
>> > > a parquet v3 reader they've got a faster parse time. Of course,
>> > ironically,
>> > > read time then may dominate even more there -it'll be important to do
>> > that
>> > > read as efficiently as possible (use a readFully() into a buffer, not
>> > lots
>> > > of single byte read() calls)
>> > >
>> >
>>
>

Re: [DISCUSS] flatbuf footer

Reply via email to