[
https://issues.apache.org/jira/browse/KAFKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13118215#comment-13118215
]
C. Scott Andreas commented on KAFKA-79:
---------------------------------------
This looks like an excellent feature, Neha - thanks for working on it. We push
a lot of highly compressible data into Kafka. Trading a bit of CPU for reduced
disk and network activity sounds excellent.
Would you be willing to accept a patch that implements support for
http://code.google.com/p/snappy in addition to (or instead of) GZip? When
consuming high-data-rate streams, we quickly peg the core on GZip decoding and
have switched to Snappy (specifically, this implementation:
http://code.google.com/p/snappy-java/) as a result.
If you have a chance, take a quick look at this JVM de/compressor throughput
comparison: https://github.com/ning/jvm-compressor-benchmark/wiki -- these
results mirror ours pretty closely. On a 36GB dataset of serialized data, we
see an 89% compression ratio out of Snappy and 95% out of GZip. At least in our
case, the slightly lower compression ratio still left us with a huge win in
terms of codec throughput (and reducing the CPU burden on consuming / producing
applications).
– Scott
> Introduce the compression feature in Kafka
> ------------------------------------------
>
> Key: KAFKA-79
> URL: https://issues.apache.org/jira/browse/KAFKA-79
> Project: Kafka
> Issue Type: New Feature
> Affects Versions: 0.6
> Reporter: Neha Narkhede
> Fix For: 0.7
>
>
> With this feature, we can enable end-to-end block compression in Kafka. The
> idea is to enable compression on the producer for some or all topics, write
> the data in compressed format on the server and make the consumers
> compression aware. The data will be decompressed only on the consumer side.
> Ideally, there should be a choice of compression codecs to be used by the
> producer. That means a change to the message header as well as the network
> byte format. On the consumer side, the state maintenance behavior of the
> zookeeper consumer changes. For compressed data, the consumed offset will be
> advanced one compressed message at a time. For uncompressed data, consumed
> offset will be advanced one message at a time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira