Hi team,

We wanted to share performance numbers for one of the candidate encodings
we have been discussing for parquet, for numeric compression.

Following doc, PFOR : Encoding
<https://docs.google.com/document/d/1ZZOtxmq6K8pNU0npijfSglTJVkspXL5GLDKPSGj9HlA/edit?tab=t.0>,
talks through numbers on compression speed, compression ratio and
decompression speed for columns values from the clickbench data. Based on
the numbers PFOR gives superior decompression speed compared to
DELTABITPACK and RLEBITPACKHYBRID and fares better on average in
compression ratio. Comparison has also been made with ZSTD compression in
the above doc.

We plan to expand to a few more datasets and work on a prototype in arrow
cpp code in early Jan.
Looking forward to the feedback from the group.

Best
Prateek

Reply via email to