Thanks again for writing this up Arnav -- it is greatly appreciated I have filed a ticket[1] in arrow-rs to track prototyping ALP in the Rust Parquet reader if anyone is interested
Andrew [1]: https://github.com/apache/arrow-rs/issues/8748 On Wed, Oct 29, 2025 at 1:41 AM Arnav Balyan <[email protected]> wrote: > Hi team, > > Just wanted to start a discussion for FSST integration in Parquet. For > quick context, FSST (Fast Static Symbol Table) enables high compression > ratios for unstructured textual data. It's used by other systems like > DuckDB and MonetDB offering upto 3.3x compression ratios with minimal > read/write overheads. > > We integrated FSST for Parquet and did benchmarks on Parquet, attached here > is a doc with our findings and results. > > https://docs.google.com/document/d/1g7zgopxeHc5nofJXfc8EEp_HGMaI8g-jFVvNCs2GVA0/edit?tab=t.0#heading=h.2eyxl5kkyzy7 > > Would love to know your suggestions, feedback and thoughts. > Regards, > > - Arnav >
