The root of problem is consumer lag on one or two partition even with no op
( read log and discard it) consumer .  Our use case is very simple.  Send
all the log lines to Brokers.  But under storm of data (due to exception or
application error etc), one or two partition gets lags behind while other
consumer are at 0 lag.  We have tune the GC using the recommended GC
setting (according to
http://www.slideshare.net/ToddPalino/enterprise-kafka-kafka-as-a-service
tuning section )   In normal situation, this is ok.

Hashing based on a key, and sticking to Murmur hash(key) % number of
partition did not give did not give a better throughput as compare to
random partitioning.   It would be good to build intelligence about
producer selection based on rate of data for topic and/or lag.   Is there
any way to customize stickiness interval for random partitioning strategy  ?

sorry for late response.

Thanks,

Bhavesh


On Mon, Aug 4, 2014 at 6:50 PM, Joe Stein <joe.st...@stealth.ly> wrote:

> Bhavesh, take a look at
>
> https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-Whyisdatanotevenlydistributedamongpartitionswhenapartitioningkeyisnotspecified
> ?
>
> Maybe the root cause issue is something else? Even if producers produce
> more or less than what they are producing you should be able to make it
> random enough with a partitioner and a key.  I don't think you should need
> more than what is in the FAQ but incase so maybe look into
> http://en.wikipedia.org/wiki/MurmurHash as another hash option.
>
> /*******************************************
>  Joe Stein
>  Founder, Principal Consultant
>  Big Data Open Source Security LLC
>  http://www.stealth.ly
>  Twitter: @allthingshadoop <http://www.twitter.com/allthingshadoop>
> ********************************************/
>
>
> On Mon, Aug 4, 2014 at 9:12 PM, Bhavesh Mistry <mistry.p.bhav...@gmail.com
> >
> wrote:
>
> > How to achieve uniform distribution of non-keyed messages per topic
> across
> > all partitions?
> >
> > We have tried to do this uniform distribution across partition using
> custom
> > partitioning from each producer instance using round robing (
> > count(messages) % number of partition for topic). This strategy results
> in
> > very poor performance.  So we have switched back to random stickiness
> that
> > Kafka provide out of box per some interval ( 10 minutes not sure exactly
> )
> > per topic.
> >
> > The above strategy results in consumer side lags sometime for some
> > partitions because we have some applications/producers  producing more
> > messages for same topic than other servers.
> >
> > Can Kafka provide out of box uniform distribution by using coordination
> > among all producers and rely on measure rate such as  # messages per
> minute
> > or # of bytes produce per minute to achieve uniform distribution and
> > coordinate stickiness of partition among hundreds of producers for same
> > topic ?
> >
> > Thanks,
> >
> > Bhavesh
> >
>

Reply via email to