Hi Collin,

One producer shouldn't need to know about the other to distribute the load
equally, but what Kafka has now is roughly equal...
If you have a single producer RounRobinPartitioner works fine, if you have
10 producers you can have 7/8 messages in one partition while another
partition has none (producers are in sync - which happened a couple times
in our tests).

Producer0 getNext() = partition0
Producer1 getNext() = partition0
Producer2 getNext() = partition0

A link to some of our test data prints:
https://imgur.com/a/ha9OQMj

This, depending on how intensive (slow) your consumption rate is, may be a
problem as it will generate enqueuing.
We use Kafka as a messaging protocol in a big (and in some points heavy
load) machine learning flow - for high throughput (lightweight processing)
enqueuing is not an issue - aƱthough we saw it happening. but for heavy
processes we are unable to do equal load balance.

We currently use the DefaultPartitioner and Kafka algorithm (murmur2 hash
of the key) to decide the partition.
We noticed enqueuing and timeouts while several consumers were idle - which
made us take a better look on how the load is balanced.

I believe the only way to perform equal load balance without having to know
other producers would be to do it on the Broker side. Do you agree?

Thanks,



On Mon, Jun 15, 2020 at 7:32 PM Colin McCabe <cmcc...@apache.org> wrote:

> Hi Vinicius,
>
> It's actually not necessary for one producer to know about the others to
> get an even distribution across partitions, right?  All that's really
> required is that all producers produce a roughly equal amount of data to
> each partition, which is what RoundRobinPartitioner is designed to do.  In
> mathematical terms, the sum of several uniform random variables is itself
> uniformly random.
>
> (There is a bug in RRP right now, KAFKA-9965, but it's not related to what
> we're talking about now and we have a fix ready.)
>
> cheers,
> Colin
>
>
> On Sun, Jun 14, 2020, at 14:26, Vinicius Scheidegger wrote:
> > Hi Collin,
> >
> > Thanks for the reply. Actually the RoundRobinPartitioner won't do an
> equal
> > distribution when working with multiple producers. One producer does not
> > know the others. If you consider that producers are randomly producing
> > messages, in the worst case scenario all producers can be synced and one
> > could have as many messages in a single partition as the number of
> > producers.
> > It's easy to generate evidences of it.
> >
> > I have asked this question on the users mail list too (and on Slack and
> on
> > Stackoverflow).
> >
> > Kafka currently does not have means to do a round robin across multiple
> > producers or on the broker side.
> >
> > This means there is currently NO GUARANTEE of equal distribution across
> > partitions as the partition election is decided by the producer.
> >
> > There result is an unbalanced consumption when working with consumer
> groups
> > and the options are: creating a custom shared partitioner, relying on
> Kafka
> > random partition or introducing a middle man between topics (all of them
> > having big cons).
> >
> > I thought of asking here to see whether this is a topic that could
> concern
> > other developers (and maybe understand whether this could be a KIP
> > discussion)
> >
> > Maybe I'm missing something... I would like to know.
> >
> > According to my interpretation of the code (just read through some
> > classes), but there is currently no way to do partition balancing on the
> > broker - the producer sends messages directly to partition leaders so
> > partition currently needs to be defined on the producer.
> >
> > I understand that in order to perform round robin across partitions of a
> > topic when working with multiple producers, some development needs to be
> > done. Am I right?
> >
> >
> > Thanks
> >
> >
> > On Fri, Jun 12, 2020, 10:57 PM Colin McCabe <cmcc...@apache.org> wrote:
> >
> > > HI Vinicius,
> > >
> > > This question seems like a better fit for the user mailing list rather
> > > than the developer mailing list.
> > >
> > > Anyway, if I understand correctly, you are asking if the producer can
> > > choose to assign partitions in a round-robin fashion rather than based
> on
> > > the key.  The answer is, you can, by using RoundRobinPartitioner.
> (again,
> > > if I'm understanding the question correctly).
> > >
> > > best,
> > > Colin
> > >
> > > On Tue, Jun 9, 2020, at 00:48, Vinicius Scheidegger wrote:
> > > > Anyone?
> > > >
> > > > On Fri, Jun 5, 2020 at 2:42 PM Vinicius Scheidegger <
> > > > vinicius.scheideg...@gmail.com> wrote:
> > > >
> > > > > Does anyone know how could I perform a load balance to distribute
> > > equally
> > > > > the messages to all consumers within the same consumer group having
> > > > > multiple producers?
> > > > >
> > > > > Is this a conceptual flaw on Kafka, wasn't it thought for equal
> > > > > distribution with multiple producers or am I missing something?
> > > > > I've asked on Stack Overflow, on Kafka users mailing group, here
> (on
> > > Kafka
> > > > > Devs) and on Slack - and still have no definitive answer (actually
> > > most of
> > > > > the time I got no answer at all)
> > > > >
> > > > > Would something like this even be possible in the way Kafka is
> > > currently
> > > > > designed?
> > > > > How does proposing for a KIP work?
> > > > >
> > > > > Thanks,
> > > > >
> > > > >
> > > > >
> > > > > On Thu, May 28, 2020, 3:44 PM Vinicius Scheidegger <
> > > > > vinicius.scheideg...@gmail.com> wrote:
> > > > >
> > > > >> Hi,
> > > > >>
> > > > >> I'm trying to understand a little bit more about how Kafka works.
> > > > >> I have a design with multiple producers writing to a single topic
> and
> > > > >> multiple consumers in a single Consumer Group consuming message
> from
> > > this
> > > > >> topic.
> > > > >>
> > > > >> My idea is to distribute the messages from all producers equally.
> From
> > > > >> reading the documentation I understood that the partition is
> always
> > > > >> selected by the producer. Is that correct?
> > > > >>
> > > > >> I'd also like to know if there is an out of the box option to
> assign
> > > the
> > > > >> partition via a round robin *on the broker side *to guarantee
> equal
> > > > >> distribution of the load - if possible to each consumer, but if
> not
> > > > >> possible, at least to each partition.
> > > > >>
> > > > >> If my understanding is correct, it looks like in a multiple
> producer
> > > > >> scenario there is lack of support from Kafka regarding load
> balancing
> > > and
> > > > >> customers have to either stick to the hash of the key (random
> > > distribution,
> > > > >> although it would guarantee same key goes to the same partition)
> or
> > > they
> > > > >> have to create their own logic on the producer side (i.e. by
> sharing
> > > memory)
> > > > >>
> > > > >> Am I missing something?
> > > > >>
> > > > >> Thank you,
> > > > >>
> > > > >> Vinicius Scheidegger
> > > > >>
> > > > >
> > > >
> > >
> >
>

Reply via email to