[
https://issues.apache.org/jira/browse/KAFKA-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13487915#comment-13487915
]
Neha Narkhede commented on KAFKA-595:
-------------------------------------
>> I would recommend we instead add a log.compression.codec property (plus
>> override map) that controls the compression on the broker.
Yes, when I said move the compression to server side, that is what I meant.
Also, I filed this JIRA to keep track of this optimization I came across while
performance testing, don't think we should push this in 0.8
> Producer side compression is unnecessary
> ----------------------------------------
>
> Key: KAFKA-595
> URL: https://issues.apache.org/jira/browse/KAFKA-595
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Neha Narkhede
> Labels: feature, features
>
> Compression can be used to store something in less space (less IO) and/or
> transfer it less expensively (better use of network bandwidth). Often the two
> go hand in hand, such as when compressed data is written to a disk: the disk
> I/O takes less time, since less bits are being transferred, and the storage
> occupied on the disk after the transfer is less. Unfortunately, the time to
> compress the data can exceed the savings gained from transferring less data,
> resulting in overall degradation.
> After KAFKA-506, the network usage gains we used to get by compressing data
> at the producers is exceeded by the cost of decompressing and re-compressing
> data at the server side. Compression to save on network costs must be done
> either to reduce the contention in a wide-area network due to multiple point
> to point connections OR to efficiently transfer data over low-bandwidth
> networks (cross DC). In the case of producer-server connections, neither is
> typically true, which means we might not benefit from producer side
> compression at all in most production deployments of Kafka. On the contrary,
> it might actually hurt performance since most production deployments turn on
> compression for all topics.
> The main benefit of compressing data in Kafka is to efficiently transfer data
> cross DC for setting up mirrored Kafka clusters. The performance benefit is
> also true for real time consumers, especially when there are multiple groups
> of consumers consuming the same topic. If data is compressed on the server
> side instead, which we do anyways, we can get the I/O savings as well as
> efficient network transfer on the server-consumer links.
> I don't have numbers to quantify the performance impact of re-compression
> now, since there are other changes that need to be done to test this
> correctly.
> Thoughts ?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira