[jira] [Commented] (KAFKA-79) Introduce the compression feature in Kafka

Neha Narkhede (Commented) (JIRA) Tue, 04 Oct 2011 10:51:57 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120327#comment-13120327
 ]


Neha Narkhede commented on KAFKA-79:
------------------------------------

Scott,

Thanks for pointing us to Snappy. I took a brief look at the benchmarks for 
Snappy, and it does look promising to me. As Jay mentioned, GZIP buys us 
increased throughput and better utilization of the network bandwidth, due to 
relatively high compression ratio. Though, its decompression cost, in terms of 
both TPS and CPU usage is not very low. According to preliminary Kafka 
compression performance benchmarks, with fetch size of 1MB, the consumer 
throughput doubled, while consuming a GZIP compressed topic. When the consumer 
is fully caught up, the CPU usage is ~45%, as compared to ~12% when the same 
consumer is consuming uncompressed data. On the producer side, for a batch size 
of 200, message size of 200, the producer throughput for generating compressed 
data is 1/2 the throughput when producing uncompressed data. That is the cost 
of compression for GZIP. Though this is tolerable for inter-DC replication, we 
could do better for more real-time applications that care about TPS more than 
the compression ratio. I see Snappy fitting well here 
(http://ning.github.com/jvm-compressor-benchmark/results/canterbury-roundtrip-2011-07-28/index.html).

The compression ratio that we see (for a producer batch size of 200) is 3x for 
GZIP on our typical tracking data set. I wonder how low this will be for 
Snappy. It will be good to check. 

It will be great to see a Snappy integration patch with some Kafka performance 
benchmarks that measure compression/decompression overhead, compression ratio, 
effect on producer/consumer throughput. 

- Neha
                
> Introduce the compression feature in Kafka
> ------------------------------------------
>
>                 Key: KAFKA-79
>                 URL: https://issues.apache.org/jira/browse/KAFKA-79
>             Project: Kafka
>          Issue Type: New Feature
>    Affects Versions: 0.6
>            Reporter: Neha Narkhede
>             Fix For: 0.7
>
>
> With this feature, we can enable end-to-end block compression in Kafka. The 
> idea is to enable compression on the producer for some or all topics, write 
> the data in compressed format on the server and make the consumers 
> compression aware. The data will be decompressed only on the consumer side. 
> Ideally, there should be a choice of compression codecs to be used by the 
> producer. That means a change to the message header as well as the network 
> byte format. On the consumer side, the state maintenance behavior of the 
> zookeeper consumer changes. For compressed data, the consumed offset will be 
> advanced one compressed message at a time. For uncompressed data, consumed 
> offset will be advanced one message at a time. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-79) Introduce the compression feature in Kafka

Reply via email to