Re: ProducerPerformance.scala compressing Array of Zeros

Ben Stopford Tue, 01 Sep 2015 12:28:59 -0700

You’re absolutely right. This should be fixed. I’ve made a note of this in 
https://issues.apache.org/jira/browse/KAFKA-2499 
<https://issues.apache.org/jira/browse/KAFKA-2499>.


If you’d like to submit a pull request for this that would be awesome :) 

Otherwise I’ll try and fit it into the other performance stuff I’m looking at. 

Ben


> On 31 Aug 2015, at 12:22, Prabhjot Bharaj <prabhbha...@gmail.com> wrote:
> 
> Hello Folks,
> 
> I was going through ProducerPerformance.scala.
> 
> Having a close look at line no. 247 in 'def generateProducerData'
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/tools/ProducerPerformance.scala,
> the message that the producer sends to kafka is an Array of 0s.
> 
> Basic understanding of compression algorithms suggest that compressing
> repetitive data can give best compression.
> 
> 
> I have also observed that when compressing array of zero bytes, the
> throughput increases significantly when I use lz4 or snappy vs
> CoCompressionCodec. But, this is largely dependent on the nature of data.
> 
> 
> Is this what we are trying to test here?
> Or, should the ProducerPerformance.scala create array of random bytes
> (instead of just zeroes) ?
> 
> If this can be improved, shall I open an issue to track this ?
> 
> Regards,
> Prabhjot

Re: ProducerPerformance.scala compressing Array of Zeros

Reply via email to