I meant to point at the FSST ticket: https://github.com/apache/arrow-rs/issues/8749 (I am already getting confused)
On Thu, Oct 30, 2025 at 5:36 PM Andrew Lamb <[email protected]> wrote: > Thanks again for writing this up Arnav -- it is greatly appreciated > > I have filed a ticket[1] in arrow-rs to track prototyping ALP in the Rust > Parquet reader if anyone is interested > > Andrew > > [1]: https://github.com/apache/arrow-rs/issues/8748 > > On Wed, Oct 29, 2025 at 1:41 AM Arnav Balyan <[email protected]> > wrote: > >> Hi team, >> >> Just wanted to start a discussion for FSST integration in Parquet. For >> quick context, FSST (Fast Static Symbol Table) enables high compression >> ratios for unstructured textual data. It's used by other systems like >> DuckDB and MonetDB offering upto 3.3x compression ratios with minimal >> read/write overheads. >> >> We integrated FSST for Parquet and did benchmarks on Parquet, attached >> here >> is a doc with our findings and results. >> >> https://docs.google.com/document/d/1g7zgopxeHc5nofJXfc8EEp_HGMaI8g-jFVvNCs2GVA0/edit?tab=t.0#heading=h.2eyxl5kkyzy7 >> >> Would love to know your suggestions, feedback and thoughts. >> Regards, >> >> - Arnav >> >
