Just wanted to follow-up. I did a first pass review on the flatbuf definitions.
Cheers, Micah On Thu, Dec 11, 2025 at 11:58 PM Alkis Evlogimenos via dev < [email protected]> wrote: > PR for linking proposal here: > https://github.com/apache/parquet-format/pull/543 > PR for parquet footer flatbuf definition: > https://github.com/apache/parquet-format/pull/544 > > On Tue, Dec 9, 2025 at 1:26 AM Julien Le Dem <[email protected]> wrote: > > > Hello Alkis, > > Do you think you could add your footer proposal to the proposals page? > > > > > > > https://github.com/apache/parquet-format/tree/master/proposals#active-proposals > > That way it gets more visibility. > > Cheers > > Julien > > > > On Tue, Oct 21, 2025 at 11:49 AM Steve Loughran > > <[email protected]> > > wrote: > > > > > On Mon, 20 Oct 2025 at 18:24, Ed Seidl <[email protected]> wrote: > > > > > > > IIUC a flatbuffer aware decoder would read the last 36 bytes or so of > > the > > > > file and look for a known UUID along with size information. With this > > it > > > > could then read only the flatbuffer bytes. I think this would work as > > > well > > > > as current systems that prefetch some number of bytes in an attempt > to > > > get > > > > the whole footer in a single get. > > > > > > > > Old readers, however, will have to fetch both footers, but won't have > > any > > > > additional decoding work because the new footer is a binary field > that > > > can > > > > be easily skipped. > > > > > > > > > > really depends what the readers do with footer prefetching. For the > java > > > clients > > > > > > > > > 1. s3a classic stream: the backwards seek() switches it to random > IO > > > mode, next read() from base of thrift will pull in > > > fs.s3a.readahead.range > > > of data No penalty > > > 2. google gs://. There's a footer cache option which will need to be > > set > > > to a larger value > > > 3. azure abfs:// there's a footer cache option which will need to be > > set > > > to a larger value > > > 4. s3a + amazon analytics stream. This stream is *parquet aware* and > > > actually parses the footer to know what to predictively prefetch. > The > > > AWS > > > developers do know of this work -moving to support the new footer > > would > > > be > > > the ideal strategy here. > > > 5. Iceberg classic input. no idea. > > > 6. iceberg + amazon analytics. same as S3A though without some of > the > > > tuning we've been doing for vector reads. > > > > > > I wouldn't worry too much about the impact of that footer size > increase. > > > Some extra footer prefetch options should compensate, and once apps > move > > to > > > a parquet v3 reader they've got a faster parse time. Of course, > > ironically, > > > read time then may dominate even more there -it'll be important to do > > that > > > read as efficiently as possible (use a readFully() into a buffer, not > > lots > > > of single byte read() calls) > > > > > >
