[
https://issues.apache.org/jira/browse/KAFKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120327#comment-13120327
]
Neha Narkhede commented on KAFKA-79:
------------------------------------
Scott,
Thanks for pointing us to Snappy. I took a brief look at the benchmarks for
Snappy, and it does look promising to me. As Jay mentioned, GZIP buys us
increased throughput and better utilization of the network bandwidth, due to
relatively high compression ratio. Though, its decompression cost, in terms of
both TPS and CPU usage is not very low. According to preliminary Kafka
compression performance benchmarks, with fetch size of 1MB, the consumer
throughput doubled, while consuming a GZIP compressed topic. When the consumer
is fully caught up, the CPU usage is ~45%, as compared to ~12% when the same
consumer is consuming uncompressed data. On the producer side, for a batch size
of 200, message size of 200, the producer throughput for generating compressed
data is 1/2 the throughput when producing uncompressed data. That is the cost
of compression for GZIP. Though this is tolerable for inter-DC replication, we
could do better for more real-time applications that care about TPS more than
the compression ratio. I see Snappy fitting well here
(http://ning.github.com/jvm-compressor-benchmark/results/canterbury-roundtrip-2011-07-28/index.html).
The compression ratio that we see (for a producer batch size of 200) is 3x for
GZIP on our typical tracking data set. I wonder how low this will be for
Snappy. It will be good to check.
It will be great to see a Snappy integration patch with some Kafka performance
benchmarks that measure compression/decompression overhead, compression ratio,
effect on producer/consumer throughput.
- Neha
> Introduce the compression feature in Kafka
> ------------------------------------------
>
> Key: KAFKA-79
> URL: https://issues.apache.org/jira/browse/KAFKA-79
> Project: Kafka
> Issue Type: New Feature
> Affects Versions: 0.6
> Reporter: Neha Narkhede
> Fix For: 0.7
>
>
> With this feature, we can enable end-to-end block compression in Kafka. The
> idea is to enable compression on the producer for some or all topics, write
> the data in compressed format on the server and make the consumers
> compression aware. The data will be decompressed only on the consumer side.
> Ideally, there should be a choice of compression codecs to be used by the
> producer. That means a change to the message header as well as the network
> byte format. On the consumer side, the state maintenance behavior of the
> zookeeper consumer changes. For compressed data, the consumed offset will be
> advanced one compressed message at a time. For uncompressed data, consumed
> offset will be advanced one message at a time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira