[ 
https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487915#comment-13487915
 ] 

Neha Narkhede commented on KAFKA-595:
-------------------------------------

>> I would recommend we instead add a log.compression.codec property (plus 
>> override map) that controls the compression on the broker. 

Yes, when I said move the compression to server side, that is what I meant. 
Also, I filed this JIRA to keep track of this optimization I came across while 
performance testing, don't think we should push this in 0.8
                
> Producer side compression is unnecessary
> ----------------------------------------
>
>                 Key: KAFKA-595
>                 URL: https://issues.apache.org/jira/browse/KAFKA-595
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Neha Narkhede
>              Labels: feature, features
>
> Compression can be used to store something in less space (less IO) and/or 
> transfer it less expensively (better use of network bandwidth). Often the two 
> go hand in hand, such as when compressed data is written to a disk: the disk 
> I/O takes less time, since less bits are being transferred, and the storage 
> occupied on the disk after the transfer is less. Unfortunately, the time to 
> compress the data can exceed the savings gained from transferring less data, 
> resulting in overall degradation.
> After KAFKA-506, the network usage gains we used to get by compressing data 
> at the producers is  exceeded by the cost of decompressing and re-compressing 
> data at the server side. Compression to save on network costs must be done 
> either to reduce the contention in a wide-area network due to multiple point 
> to point connections OR to efficiently transfer data over low-bandwidth 
> networks (cross DC). In the case of producer-server connections, neither is 
> typically true, which means we might not benefit from producer side 
> compression at all in most production deployments of Kafka. On the contrary, 
> it might actually hurt performance since most production deployments turn on 
> compression for all topics.
> The main benefit of compressing data in Kafka is to efficiently transfer data 
> cross DC for setting up mirrored Kafka clusters. The performance benefit is 
> also true for real time consumers, especially when there are multiple groups 
> of consumers consuming the same topic. If data is compressed on the server 
> side instead, which we do anyways, we can get the I/O savings as well as 
> efficient network transfer on the server-consumer links.
> I don't have numbers to quantify the performance impact of re-compression 
> now, since there are other changes that need to be done to test this 
> correctly.
> Thoughts ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to