[ https://issues.apache.org/jira/browse/KAFKA-3704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ismael Juma reopened KAFKA-3704: -------------------------------- > Improve mechanism for compression stream block size selection in KafkaProducer > ------------------------------------------------------------------------------ > > Key: KAFKA-3704 > URL: https://issues.apache.org/jira/browse/KAFKA-3704 > Project: Kafka > Issue Type: Bug > Reporter: Guozhang Wang > Assignee: Ismael Juma > > As discovered in https://issues.apache.org/jira/browse/KAFKA-3565, the > current default block size (1K) used in Snappy and GZIP may cause a > sub-optimal compression ratio for Snappy, and hence reduce throughput. > Because we no longer recompress data in the broker, it also impacts what gets > stored on disk. > A solution might be to use the default block size, which is 64K in LZ4, 32K > in Snappy and 0.5K in GZIP. The downside is that this solution will require > more memory allocated outside of the buffer pool and hence users may need to > bump up their JVM heap size, especially for MirrorMakers. Using Snappy as an > example, it's an additional 2x32k per batch (as Snappy uses two buffers) and > one would expect at least one batch per partition. However, the number of > batches per partition can be much higher if the broker is slow to acknowledge > producer requests (depending on `buffer.memory`, `batch.size`, message size, > etc.). > Given the above, it seems like a configuration may be needed as the there is > no one size fits all. An alternative to a new config is to allocate buffers > from the buffer pool and pass them to the compression library. This is > possible with Snappy and we could adapt our LZ4 code. It's not possible with > GZIP, but it uses a very small buffer by default. > Note that we decided that this change was too risky for 0.10.0.0 and reverted > the original attempt. -- This message was sent by Atlassian JIRA (v6.3.4#6332)