[jira] [Commented] (KAFKA-79) Introduce the compression feature in Kafka

Jay Kreps (JIRA) Fri, 12 Aug 2011 09:38:53 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-79?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084235#comment-13084235
 ]


Jay Kreps commented on KAFKA-79:
--------------------------------

We have some performance comparisons, we should include that information on the 
performance page at least by the time this is released. Of course our primary 
concern is interdatacenter bandwidth rather than performance per se. We see a 
~30% compression ratio on our Avro tracking data.

Neha should be able to give a diff. I think it was the last checkin on github 
before the cutover.

It is important that decompression always happen with the codec used for 
compression, so it can't just be the case that there is some property 
compression.codec=org.apache.kafka.GzipCompressor in the config because a 
mismatch on producer and consumer would lead to unreadable data, and if two 
people send messages with different codecs you would be totally screwed. This 
means the codec used must be maintained with the message set. We do this by 
having a compression id where 0=none, 1=gzip, etc. This doesn't lend itself to 
extensability since that list has to be predetermined, but we could reserve a 
codec id for "user defined" codec and leave it up to the user to configure it 
right.

My intuition is that most people just want a good compression implementation 
included out of the box and don't want to fiddle with it so i think it would be 
best to get that right. I think even in the long run there are really only 2-3 
algorithms that have a reasonable cpu/size compression/decompression tradeoff 
so it makes sense to just implement and fully test those for perf and 
correctness and include those in a way that can't break.

> Introduce the compression feature in Kafka
> ------------------------------------------
>
>                 Key: KAFKA-79
>                 URL: https://issues.apache.org/jira/browse/KAFKA-79
>             Project: Kafka
>          Issue Type: New Feature
>    Affects Versions: 0.6
>            Reporter: Neha Narkhede
>             Fix For: 0.7
>
>
> With this feature, we can enable end-to-end block compression in Kafka. The 
> idea is to enable compression on the producer for some or all topics, write 
> the data in compressed format on the server and make the consumers 
> compression aware. The data will be decompressed only on the consumer side. 
> Ideally, there should be a choice of compression codecs to be used by the 
> producer. That means a change to the message header as well as the network 
> byte format. On the consumer side, the state maintenance behavior of the 
> zookeeper consumer changes. For compressed data, the consumed offset will be 
> advanced one compressed message at a time. For uncompressed data, consumed 
> offset will be advanced one message at a time. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-79) Introduce the compression feature in Kafka

Reply via email to