Hi,

I looked at the doc and the stated decompression speeds for ZSTD look highly irrealistic.

I think what happens is that you are computing decompression speed as:

  size of compressed data / time to decompress

while you should really compute it as:

  size of uncompressed data / time to decompress

Otherwise you're simply making ZSTD look miserable because it compresses so well.

Also, I think you should also add BYTE_STREAM_SPLIT + ZSTD into the mix (and possible BYTE_STREAM_SPLIT + LZ4 if you're going to evaluate LZ4).

Regards

Antoine.


Le 06/12/2025 à 23:24, PRATEEK GAUR a écrit :
Hi team,

We wanted to share performance numbers for one of the candidate encodings
we have been discussing for parquet, for numeric compression.

Following doc, PFOR : Encoding
<https://docs.google.com/document/d/1ZZOtxmq6K8pNU0npijfSglTJVkspXL5GLDKPSGj9HlA/edit?tab=t.0>,
talks through numbers on compression speed, compression ratio and
decompression speed for columns values from the clickbench data. Based on
the numbers PFOR gives superior decompression speed compared to
DELTABITPACK and RLEBITPACKHYBRID and fares better on average in
compression ratio. Comparison has also been made with ZSTD compression in
the above doc.

We plan to expand to a few more datasets and work on a prototype in arrow
cpp code in early Jan.
Looking forward to the feedback from the group.

Best
Prateek



Reply via email to