Jiangjie Qin created KAFKA-3995:
-----------------------------------
Summary: Add a new configuration
"enable.comrpession.ratio.estimation" to the producer config
Key: KAFKA-3995
URL: https://issues.apache.org/jira/browse/KAFKA-3995
Project: Kafka
Issue Type: Improvement
Components: clients
Affects Versions: 0.10.0.0
Reporter: Jiangjie Qin
Fix For: 0.10.1.0
We recently see a few cases where RecordTooLargeException is thrown because the
compressed message sent by KafkaProducer exceeded the max message size.
The root cause of this issue is because the compressor is estimating the batch
size using an estimated compression ratio based on heuristic compression ratio
statistics. This does not quite work for the traffic with highly variable
compression ratios.
For example, if the batch size is set to 100KB and the max message size is 1MB.
Initially a the producer is sending messages (each message is 100KB) to topic_1
whose data can be compressed to 1/10 of the original size. After a while the
estimated compression ratio in the compressor will be trained to 1/10 and the
producer would put 10 messages into one batch. Now the producer starts to send
messages (each message is also 100KB) to topic_2 whose message can only be
compress to 1/5 of the original size. The producer would still use 1/10 as the
estimated compression ratio and put 10 messages into a batch. That batch would
be 2 MB after compression which exceeds the maximum message size. In this case
the user do not have many options other than resend everything or close the
producer if they care about ordering.
This is especially an issue for services like MirrorMaker whose producer is
shared by many different topics.
To solve this issue, we can probably add a configuration
"enable.compression.ratio.estimation" to the producer. So when this
configuration is set to false, we stop estimating the compressed size but will
close the batch once the uncompressed bytes in the batch reaches the batch size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)