You’re absolutely right. This should be fixed. I’ve made a note of this in https://issues.apache.org/jira/browse/KAFKA-2499 <https://issues.apache.org/jira/browse/KAFKA-2499>.
If you’d like to submit a pull request for this that would be awesome :) Otherwise I’ll try and fit it into the other performance stuff I’m looking at. Ben > On 31 Aug 2015, at 12:22, Prabhjot Bharaj <prabhbha...@gmail.com> wrote: > > Hello Folks, > > I was going through ProducerPerformance.scala. > > Having a close look at line no. 247 in 'def generateProducerData' > https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/ProducerPerformance.scala, > the message that the producer sends to kafka is an Array of 0s. > > Basic understanding of compression algorithms suggest that compressing > repetitive data can give best compression. > > > I have also observed that when compressing array of zero bytes, the > throughput increases significantly when I use lz4 or snappy vs > CoCompressionCodec. But, this is largely dependent on the nature of data. > > > Is this what we are trying to test here? > Or, should the ProducerPerformance.scala create array of random bytes > (instead of just zeroes) ? > > If this can be improved, shall I open an issue to track this ? > > Regards, > Prabhjot