Hi Apurva, Yes, it is true that the request size might be much smaller if the batching is based on uncompressed size. I will let the users know about this. That said, in practice, this is probably fine. For example, at LinkedIn, our max message size is 1 MB, typically the compressed size would be 100 KB or larger, given that in most cases, there are many partitions, the request size would not be too small (typically around a few MB).
At LinkedIn we do have some topics has various compression ratio. Those are usually topics shared by different services so the data may differ a lot although they are in the same topic and similar fields. Thanks, Jiangjie (Becket) Qin On Tue, Feb 21, 2017 at 6:17 PM, Apurva Mehta <apu...@confluent.io> wrote: > Hi Becket, Thanks for the kip. > > I think one of the risks here is that when compression estimation is > disabled, you could have much smaller batches than expected, and throughput > could be hurt. It would be worth adding this to the documentation of this > setting. > > Also, one of the rejected alternatives states that per topic estimations > would not work when the compression of individual messages is variable. > This is true in theory, but in practice one would expect Kafka topics to > have fairly homogenous data, and hence should compress evenly. I was > curious if you have data which shows otherwise. > > Thanks, > Apurva > > On Tue, Feb 21, 2017 at 12:30 PM, Becket Qin <becket....@gmail.com> wrote: > > > Hi folks, > > > > I would like to start the discussion thread on KIP-126. The KIP propose > > adding a new configuration to KafkaProducer to allow batching based on > > uncompressed message size. > > > > Comments are welcome. > > > > The KIP wiki is following: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > 126+-+Allow+KafkaProducer+to+batch+based+on+uncompressed+size > > > > Thanks, > > > > Jiangjie (Becket) Qin > > >