> I don't see any issue here: https://github.com/apache/parquet-format/issues
That is a good call -- I filed https://github.com/apache/parquet-format/issues/530 to track On Mon, Oct 20, 2025 at 8:17 AM Andrew Bell <[email protected]> wrote: > On Mon, Oct 20, 2025 at 6:07 AM Alkis Evlogimenos > <[email protected]> wrote: > > > Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of > > the serialized form and picks fields out of it one by one. Flatbuf > instead > > takes the serialized form and uses offsets already embedded in it to > > extract fields from the serialized form directly. In other words there is > > no parsing done. We have 3 ways to use the flatbuf each of which adds > more > > overhead > > > ... > > Maybe I was confused by this: > > This benchmark achieves nearly an order of magnitude improvement (7x) > > > > > parsing Parquet metadata with no changes to the Parquet format, by > > > simply > > > > > writing a more efficient thrift decoder (which can also skip > > > statistics). > > It was unclear to me if this was still about flatbuf or about writing > better thrift decoder. Is there a write-up describing exactly what's being > proposed? I don't see any issue here: > https://github.com/apache/parquet-format/issues > > -- > Andrew Bell > [email protected] >
