Hi team, Just wanted to start a discussion for FSST integration in Parquet. For quick context, FSST (Fast Static Symbol Table) enables high compression ratios for unstructured textual data. It's used by other systems like DuckDB and MonetDB offering upto 3.3x compression ratios with minimal read/write overheads.
We integrated FSST for Parquet and did benchmarks on Parquet, attached here is a doc with our findings and results. https://docs.google.com/document/d/1g7zgopxeHc5nofJXfc8EEp_HGMaI8g-jFVvNCs2GVA0/edit?tab=t.0#heading=h.2eyxl5kkyzy7 Would love to know your suggestions, feedback and thoughts. Regards, - Arnav
