On Mon, Oct 20, 2025 at 6:07 AM Alkis Evlogimenos
<[email protected]> wrote:

> Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of
> the serialized form and picks fields out of it one by one. Flatbuf instead
> takes the serialized form and uses offsets already embedded in it to
> extract fields from the serialized form directly. In other words there is
> no parsing done. We have 3 ways to use the flatbuf each of which adds more
> overhead
>
...

Maybe I was confused by this:

 This benchmark achieves nearly an order of magnitude improvement (7x)
> > > > parsing Parquet metadata with no changes to the Parquet format, by
> > simply
> > > > writing a more efficient thrift decoder (which can also skip
> > statistics).

It was unclear to me if this was still about flatbuf or about writing
better thrift decoder. Is there a write-up describing exactly what's being
proposed? I don't see any issue here:
https://github.com/apache/parquet-format/issues

-- 
Andrew Bell
[email protected]

Reply via email to