Re: Uniform Distribution of Messages for Topic Across Partitions Without Effecting Performance

Joe Stein Mon, 04 Aug 2014 18:51:36 -0700

Bhavesh, take a look at
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
?


Maybe the root cause issue is something else? Even if producers produce
more or less than what they are producing you should be able to make it
random enough with a partitioner and a key.  I don't think you should need
more than what is in the FAQ but incase so maybe look into
http://en.wikipedia.org/wiki/MurmurHash as another hash option.

/*******************************************
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
********************************************/


On Mon, Aug 4, 2014 at 9:12 PM, Bhavesh Mistry <mistry.p.bhav...@gmail.com>
wrote:

> How to achieve uniform distribution of non-keyed messages per topic across
> all partitions?
>
> We have tried to do this uniform distribution across partition using custom
> partitioning from each producer instance using round robing (
> count(messages) % number of partition for topic). This strategy results in
> very poor performance.  So we have switched back to random stickiness that
> Kafka provide out of box per some interval ( 10 minutes not sure exactly )
> per topic.
>
> The above strategy results in consumer side lags sometime for some
> partitions because we have some applications/producers  producing more
> messages for same topic than other servers.
>
> Can Kafka provide out of box uniform distribution by using coordination
> among all producers and rely on measure rate such as  # messages per minute
> or # of bytes produce per minute to achieve uniform distribution and
> coordinate stickiness of partition among hundreds of producers for same
> topic ?
>
> Thanks,
>
> Bhavesh
>

Re: Uniform Distribution of Messages for Topic Across Partitions Without Effecting Performance

Reply via email to