> Maybe I was confused by this:

There are (at least) two parallel things going on:

1. Work in the Rust Parquet implementation to speed up the parsing of
thrift footers (no change to Parquet format)[1][2]
2. A proposal to change the Parquet format to add a optional FlatBuffers
based footer [3]

[1]: https://github.com/apache/arrow-rs/issues/5854
[2]: https://github.com/alamb/parquet_footer_parsing
[3]:
https://docs.google.com/document/d/1kZS_DM_J8n6NKff3vDQPD1Y4xyDdRceYFANUE0bOfb0/edit?tab=t.0#heading=h.ccu4zzsy0tm5

Andrew

On Mon, Oct 20, 2025 at 8:17 AM Andrew Bell <[email protected]>
wrote:

> On Mon, Oct 20, 2025 at 6:07 AM Alkis Evlogimenos
> <[email protected]> wrote:
>
> > Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of
> > the serialized form and picks fields out of it one by one. Flatbuf
> instead
> > takes the serialized form and uses offsets already embedded in it to
> > extract fields from the serialized form directly. In other words there is
> > no parsing done. We have 3 ways to use the flatbuf each of which adds
> more
> > overhead
> >
> ...
>
> Maybe I was confused by this:
>
>  This benchmark achieves nearly an order of magnitude improvement (7x)
> > > > > parsing Parquet metadata with no changes to the Parquet format, by
> > > simply
> > > > > writing a more efficient thrift decoder (which can also skip
> > > statistics).
>
> It was unclear to me if this was still about flatbuf or about writing
> better thrift decoder. Is there a write-up describing exactly what's being
> proposed? I don't see any issue here:
> https://github.com/apache/parquet-format/issues
>
> --
> Andrew Bell
> [email protected]
>

Reply via email to