Neha Narkhede created KAFKA-595:
-----------------------------------
Summary: Producer side compression is unnecessary
Key: KAFKA-595
URL: https://issues.apache.org/jira/browse/KAFKA-595
Project: Kafka
Issue Type: Bug
Affects Versions: 0.8
Reporter: Neha Narkhede
Compression can be used to store something in less space (less IO) and/or
transfer it less expensively (better use of network bandwidth). Often the two
go hand in hand, such as when compressed data is written to a disk: the disk
I/O takes less time, since less bits are being transferred, and the storage
occupied on the disk after the transfer is less. Unfortunately, the time to
compress the data can exceed the savings gained from transferring less data,
resulting in overall degradation.
After KAFKA-506, the network usage gains we used to get by compressing data at
the producers is exceeded by the cost of decompressing and re-compressing data
at the server side. Compression to save on network costs must be done either to
reduce the contention in a wide-area network due to multiple point to point
connections OR to efficiently transfer data over low-bandwidth networks (cross
DC). In the case of producer-server connections, neither is typically true,
which means we might not benefit from producer side compression at all in most
production deployments of Kafka. On the contrary, it might actually hurt
performance since most production deployments turn on compression for all
topics.
The main benefit of compressing data in Kafka is to efficiently transfer data
cross DC for setting up mirrored Kafka clusters. The performance benefit is
also true for real time consumers, especially when there are multiple groups of
consumers consuming the same topic. If data is compressed on the server side
instead, which we do anyways, we can get the I/O savings as well as efficient
network transfer on the server-consumer links.
I don't have numbers to quantify the performance impact of re-compression now,
since there are other changes that need to be done to test this correctly.
Thoughts ?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira