[ 
https://issues.apache.org/jira/browse/KAFKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118215#comment-13118215
 ] 

C. Scott Andreas commented on KAFKA-79:
---------------------------------------

This looks like an excellent feature, Neha - thanks for working on it. We push 
a lot of highly compressible data into Kafka. Trading a bit of CPU for reduced 
disk and network activity sounds excellent.

Would you be willing to accept a patch that implements support for 
http://code.google.com/p/snappy in addition to (or instead of) GZip? When 
consuming high-data-rate streams, we quickly peg the core on GZip decoding and 
have switched to Snappy (specifically, this implementation: 
http://code.google.com/p/snappy-java/) as a result.

If you have a chance, take a quick look at this JVM de/compressor throughput 
comparison: https://github.com/ning/jvm-compressor-benchmark/wiki -- these 
results mirror ours pretty closely. On a 36GB dataset of serialized data, we 
see an 89% compression ratio out of Snappy and 95% out of GZip. At least in our 
case, the slightly lower compression ratio still left us with a huge win in 
terms of codec throughput (and reducing the CPU burden on consuming / producing 
applications).

– Scott
                
> Introduce the compression feature in Kafka
> ------------------------------------------
>
>                 Key: KAFKA-79
>                 URL: https://issues.apache.org/jira/browse/KAFKA-79
>             Project: Kafka
>          Issue Type: New Feature
>    Affects Versions: 0.6
>            Reporter: Neha Narkhede
>             Fix For: 0.7
>
>
> With this feature, we can enable end-to-end block compression in Kafka. The 
> idea is to enable compression on the producer for some or all topics, write 
> the data in compressed format on the server and make the consumers 
> compression aware. The data will be decompressed only on the consumer side. 
> Ideally, there should be a choice of compression codecs to be used by the 
> producer. That means a change to the message header as well as the network 
> byte format. On the consumer side, the state maintenance behavior of the 
> zookeeper consumer changes. For compressed data, the consumed offset will be 
> advanced one compressed message at a time. For uncompressed data, consumed 
> offset will be advanced one message at a time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to