Hi Erik,

I have put my efforts on the produce side till now, Thanks for making me
aware that consumer will decompress automatically.

I'll also consider your point on creating real-life messages

But, I have still have one confusion -

Why would the current ProducerPerformance.scala compress an Array of Bytes
with all zeros ?
That will anyways give better throughput. correct ?

Regards,
Prabhjot

On Tue, Aug 25, 2015 at 7:05 PM, Helleren, Erik <erik.helle...@cmegroup.com>
wrote:

> Hi Prabhjot,
> There are two important things to know about kafka compression:  First
> uncompression happens automatically in the consumer
> (https://cwiki.apache.org/confluence/display/KAFKA/Compression) so you
> should see ascii returned on the consumer side. The best way to see if
> compression has happened that I know of is to actually look at a packet
> capture.
>
> Second, the producer does not compress individual messages, but actually
> batches several sequential messages to the same topic and partition
> together and compresses that compound message.
> (
> https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Pro
> tocol#AGuideToTheKafkaProtocol-Compression) Thus, a fixed string will
> still see far better compression ratios than a Œtypical' real life
> message.
>
> Making a real-life-like message isn¹t easy, and depends heavily on your
> domain. But a general approach would be to generate messages by randomly
> selected words from a dictionary.  And having a dictionary around thousand
> large words means there is a reasonable chance of the same words appearing
> multiple times in the same message.  Also words can be non-sence like
> ³asdfasdfasdfasdf², or large words in the language of your choice.  The
> goal is for each message to be unique, but still have similar chunks that
> a compression algorithm can detect and compress.
>
> -Erik
>
>
> On 8/25/15, 6:47 AM, "Prabhjot Bharaj" <prabhbha...@gmail.com> wrote:
>
> >Hi,
> >
> >I have bene trying to use kafka-producer-perf-test.sh to arrive at certain
> >benchmarks.
> >When I try to run it with --compression-codec values of 1, 2 and 3, I
> >notice increased throughput compared to NoCompressionCodec
> >
> >But, When I checked the Producerperformance.scala, I saw that the the
> >`producer.send` is getting data from the method: `generateProducerData`.
> >But, this data is just an empty array of Bytes.
> >
> >Now, as per my basic understanding of compression algorithms, I think a
> >byte sequence of zeros will eventually result in a very small message,
> >because of which I thought I might be observing better throughput.
> >
> >So, in line: 247 of ProducerPerformance.scala, I did this minor code
> >change:-
> >
> >
> >
> >*val message =
> >"qopwr11591UPD113582260001AS1IL1-1N/A1Entertainment1-1an-example.com1-1-1-
> >1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,MED,HIGH,HD,.mp4.csm
> >il/bitrate=11subcategory
> >71Title
> >10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-
> >1-1-1-1-1-1-111-1-1-r1VR-11591UPD113582260001AS1IL1-1N/A1Entertainment1-1a
> >n-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,M
> >ED,HIGH,HD,.mp4.csmil/bitrate=11subcategory
> >71Title
> >10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-
> >1-1-1-1-1-1-111-1-1-r1VR-11591UPD113582260001AS1IL1-1N/A1Entertainment1-1a
> >n-example.com1-1-1-1-1-1-1-1011413/011413_factor_points_FNC_,LOW,MED_LOW,M
> >ED,HIGH,HD,.mp4.csmil/bitrate=11subcategory
> >71Title
> >10^D1-1-111-1-1-1-1-1-111-1-1-1-1-115101-1-1-1-1126112491-1-1-1-1-1-1-1-1-
> >1-1-1-1-1-1-111-1-1-"message.getBytes().slice(0,msgSize)*
> >
> >
> >This makes sure that I have a big message, and I can slice that
> >message to the message size passed in the command line options
> >
> >
> >But, the problem is that when I try running the same with
> >--compression-codec vlues of 1, 2 or 3, I still am seeing ASCII data
> >(i.e. uncompressed one only)
> >
> >
> >I want to ask whether this is a bug. And, using
> >kafka-producer-perf-test.sh, how can I send my own compressed data ?
> >
> >
> >Thanks,
> >
> >Prabhjot
>
>


-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Reply via email to