Hi team, We wanted to share performance numbers for one of the candidate encodings we have been discussing for parquet, for numeric compression.
Following doc, PFOR : Encoding <https://docs.google.com/document/d/1ZZOtxmq6K8pNU0npijfSglTJVkspXL5GLDKPSGj9HlA/edit?tab=t.0>, talks through numbers on compression speed, compression ratio and decompression speed for columns values from the clickbench data. Based on the numbers PFOR gives superior decompression speed compared to DELTABITPACK and RLEBITPACKHYBRID and fares better on average in compression ratio. Comparison has also been made with ZSTD compression in the above doc. We plan to expand to a few more datasets and work on a prototype in arrow cpp code in early Jan. Looking forward to the feedback from the group. Best Prateek
