Re: [DISCUSS] flatbuf footer

Andrew Lamb Mon, 20 Oct 2025 05:34:01 -0700

>  I don't see any issue here:
https://github.com/apache/parquet-format/issues


That is a good call -- I filed
https://github.com/apache/parquet-format/issues/530 to track

On Mon, Oct 20, 2025 at 8:17 AM Andrew Bell <[email protected]>
wrote:

> On Mon, Oct 20, 2025 at 6:07 AM Alkis Evlogimenos
> <[email protected]> wrote:
>
> > Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of
> > the serialized form and picks fields out of it one by one. Flatbuf
> instead
> > takes the serialized form and uses offsets already embedded in it to
> > extract fields from the serialized form directly. In other words there is
> > no parsing done. We have 3 ways to use the flatbuf each of which adds
> more
> > overhead
> >
> ...
>
> Maybe I was confused by this:
>
>  This benchmark achieves nearly an order of magnitude improvement (7x)
> > > > > parsing Parquet metadata with no changes to the Parquet format, by
> > > simply
> > > > > writing a more efficient thrift decoder (which can also skip
> > > statistics).
>
> It was unclear to me if this was still about flatbuf or about writing
> better thrift decoder. Is there a write-up describing exactly what's being
> proposed? I don't see any issue here:
> https://github.com/apache/parquet-format/issues
>
> --
> Andrew Bell
> [email protected]
>

Re: [DISCUSS] flatbuf footer

Reply via email to