On Mon, Oct 20, 2025 at 6:07 AM Alkis Evlogimenos <[email protected]> wrote:
> Flatbuf parsing is trivial compared to thrift. Thrift walks the bytes of > the serialized form and picks fields out of it one by one. Flatbuf instead > takes the serialized form and uses offsets already embedded in it to > extract fields from the serialized form directly. In other words there is > no parsing done. We have 3 ways to use the flatbuf each of which adds more > overhead > ... Maybe I was confused by this: This benchmark achieves nearly an order of magnitude improvement (7x) > > > > parsing Parquet metadata with no changes to the Parquet format, by > > simply > > > > writing a more efficient thrift decoder (which can also skip > > statistics). It was unclear to me if this was still about flatbuf or about writing better thrift decoder. Is there a write-up describing exactly what's being proposed? I don't see any issue here: https://github.com/apache/parquet-format/issues -- Andrew Bell [email protected]
