Hi Erik, I have put my efforts on the produce side till now, Thanks for making me aware that consumer will decompress automatically.
I'll also consider your point on creating real-life messages But, I have still have one confusion - Why would the current ProducerPerformance.scala compress an Array of Bytes with all zeros ? That will anyways give better throughput. correct ? Regards, Prabhjot On Tue, Aug 25, 2015 at 7:05 PM, Helleren, Erik <erik.helle...@cmegroup.com> wrote: > Hi Prabhjot, > There are two important things to know about kafka compression: First > uncompression happens automatically in the consumer > (https://cwiki.apache.org/confluence/display/KAFKA/Compression) so you > should see ascii returned on the consumer side. The best way to see if > compression has happened that I know of is to actually look at a packet > capture. > > Second, the producer does not compress individual messages, but actually > batches several sequential messages to the same topic and partition > together and compresses that compound message. > ( > https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Pro > tocol#AGuideToTheKafkaProtocol-Compression) Thus, a fixed string will > still see far better compression ratios than a Œtypical' real life > message. > > Making a real-life-like message isn¹t easy, and depends heavily on your > domain. But a general approach would be to generate messages by randomly > selected words from a dictionary. And having a dictionary around thousand > large words means there is a reasonable chance of the same words appearing > multiple times in the same message. Also words can be non-sence like > ³asdfasdfasdfasdf², or large words in the language of your choice. The > goal is for each message to be unique, but still have similar chunks that > a compression algorithm can detect and compress. > > -Erik > > > On 8/25/15, 6:47 AM, "Prabhjot Bharaj" <prabhbha...@gmail.com> wrote: > > >Hi, > > > >I have bene trying to use kafka-producer-perf-test.sh to arrive at certain > >benchmarks. > >When I try to run it with --compression-codec values of 1, 2 and 3, I > >notice increased throughput compared to NoCompressionCodec > > > >But, When I checked the Producerperformance.scala, I saw that the the > >`producer.send` is getting data from the method: `generateProducerData`. > >But, this data is just an empty array of Bytes. > > > >Now, as per my basic understanding of compression algorithms, I think a > >byte sequence of zeros will eventually result in a very small message, > >because of which I thought I might be observing better throughput. > > > >So, in line: 247 of ProducerPerformance.scala, I did this minor code > >change:- > > > > > > > >*val message = > >"qopwr11591UPD113582260001AS1IL1-1N/A1Entertainment1-1an-example.com1-1-1- > >1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,MED,HIGH,HD,.mp4.csm > >il/bitrate=11subcategory > >71Title > >10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1- > >1-1-1-1-1-1-111-1-1-r1VR-11591UPD113582260001AS1IL1-1N/A1Entertainment1-1a > >n-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,M > >ED,HIGH,HD,.mp4.csmil/bitrate=11subcategory > >71Title > >10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1- > >1-1-1-1-1-1-111-1-1-r1VR-11591UPD113582260001AS1IL1-1N/A1Entertainment1-1a > >n-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,M > >ED,HIGH,HD,.mp4.csmil/bitrate=11subcategory > >71Title > >10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1- > >1-1-1-1-1-1-111-1-1-"message.getBytes().slice(0,msgSize)* > > > > > >This makes sure that I have a big message, and I can slice that > >message to the message size passed in the command line options > > > > > >But, the problem is that when I try running the same with > >--compression-codec vlues of 1, 2 or 3, I still am seeing ASCII data > >(i.e. uncompressed one only) > > > > > >I want to ask whether this is a bug. And, using > >kafka-producer-perf-test.sh, how can I send my own compressed data ? > > > > > >Thanks, > > > >Prabhjot > > -- --------------------------------------------------------- "There are only 10 types of people in the world: Those who understand binary, and those who don't"