Added to my to-do list.  I'm debugging our Parquet v2 page reader code at the moment, then I'll do a combined post about "Parquet improvements".

On 2021/09/29 16:46, Ted Dunning wrote:
A blog is a great idea.

I am curious about how much compression costs.


On Wed, Sep 29, 2021 at 5:37 AM luoc <[email protected]> wrote:

James, you are doing fine.
Is it possible to post a new blog in the website for this?

在 2021年9月29日,20:27,James Turton <[email protected]> 写道:

Hi all

We've got support for reading and writing using additional Parquet
compression codecs in master now.  Here are the footprints of a 25M record
dataset compressed by Drill with different codecs.
| Codec  | Size on disk (Mb) |
| ------ | ----------------- |
| brotli |   87              |
| gzip   |   80              |
| lz4    |  100.6            |
| lzo    |  100.8            |
| snappy |  192              |
| zstd   |   85              |
| none   | 2152              |

I haven't made measurements of (de)compression speed differences myself
but there are many such benchmarks around on the web, and the differences
can be big *if* you've got a workload that is CPU bound by
(de)compression.  Beyond that there are the usual considerations like
better utilisation of the OS page cache by the higher compression ratio
codecs, less I/O when data must come from disk, etc.  Zstd is probably the
one I'll be putting into `store.parquet.compression` myself at this point.
Happy Drilling!
James


Reply via email to