Hi Andrew, Thanks for filing the ticket, really appreciate it. I’ll check out the Rust side of things and get the performance numbers. I would expect it to do better in Rust, but will share once the results are ready.
Thanks. Arnav On Fri, Oct 31, 2025 at 3:07 AM Andrew Lamb <[email protected]> wrote: > I meant to point at the FSST ticket: > https://github.com/apache/arrow-rs/issues/8749 (I am already getting > confused) > > On Thu, Oct 30, 2025 at 5:36 PM Andrew Lamb <[email protected]> > wrote: > > > Thanks again for writing this up Arnav -- it is greatly appreciated > > > > I have filed a ticket[1] in arrow-rs to track prototyping ALP in the Rust > > Parquet reader if anyone is interested > > > > Andrew > > > > [1]: https://github.com/apache/arrow-rs/issues/8748 > > > > On Wed, Oct 29, 2025 at 1:41 AM Arnav Balyan <[email protected]> > > wrote: > > > >> Hi team, > >> > >> Just wanted to start a discussion for FSST integration in Parquet. For > >> quick context, FSST (Fast Static Symbol Table) enables high compression > >> ratios for unstructured textual data. It's used by other systems like > >> DuckDB and MonetDB offering upto 3.3x compression ratios with minimal > >> read/write overheads. > >> > >> We integrated FSST for Parquet and did benchmarks on Parquet, attached > >> here > >> is a doc with our findings and results. > >> > >> > https://docs.google.com/document/d/1g7zgopxeHc5nofJXfc8EEp_HGMaI8g-jFVvNCs2GVA0/edit?tab=t.0#heading=h.2eyxl5kkyzy7 > >> > >> Would love to know your suggestions, feedback and thoughts. > >> Regards, > >> > >> - Arnav > >> > > >
