PR for linking proposal here: https://github.com/apache/parquet-format/pull/543 PR for parquet footer flatbuf definition: https://github.com/apache/parquet-format/pull/544
On Tue, Dec 9, 2025 at 1:26 AM Julien Le Dem <[email protected]> wrote: > Hello Alkis, > Do you think you could add your footer proposal to the proposals page? > > > https://github.com/apache/parquet-format/tree/master/proposals#active-proposals > That way it gets more visibility. > Cheers > Julien > > On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran > <[email protected]> > wrote: > > > On Mon, 20 Oct 2025 at 18:24, Ed Seidl <[email protected]> wrote: > > > > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so of > the > > > file and look for a known UUID along with size information. With this > it > > > could then read only the flatbuffer bytes. I think this would work as > > well > > > as current systems that prefetch some number of bytes in an attempt to > > get > > > the whole footer in a single get. > > > > > > Old readers, however, will have to fetch both footers, but won't have > any > > > additional decoding work because the new footer is a binary field that > > can > > > be easily skipped. > > > > > > > really depends what the readers do with footer prefetching. For the java > > clients > > > > > > 1. s3a classic stream: the backwards seek() switches it to random IO > > mode, next read() from base of thrift will pull in > > fs.s3a.readahead.range > > of data No penalty > > 2. google gs://. There's a footer cache option which will need to be > set > > to a larger value > > 3. azure abfs:// there's a footer cache option which will need to be > set > > to a larger value > > 4. s3a + amazon analytics stream. This stream is *parquet aware* and > > actually parses the footer to know what to predictively prefetch. The > > AWS > > developers do know of this work -moving to support the new footer > would > > be > > the ideal strategy here. > > 5. Iceberg classic input. no idea. > > 6. iceberg + amazon analytics. same as S3A though without some of the > > tuning we've been doing for vector reads. > > > > I wouldn't worry too much about the impact of that footer size increase. > > Some extra footer prefetch options should compensate, and once apps move > to > > a parquet v3 reader they've got a faster parse time. Of course, > ironically, > > read time then may dominate even more there -it'll be important to do > that > > read as efficiently as possible (use a readFully() into a buffer, not > lots > > of single byte read() calls) > > >
