Re: On the best number of partitions per topic for kafka based application and the best number of consumers per topic?

2020-11-15 Thread Vinicius Scheidegger
Kafka alone wouldn't scale down your consumers, but you can get metrics
from Kafka and use them to change the number of consumers.

Scaling up/down is normally done at other app levels, using tools whose
purpose is to watch metrics and scaling up/down (say HPA kubernetes /
openshift if you are working with docker containers, or Cloud watch ASG in
AWS).

Lightbend has an open source consumer lag exporter that could help you
exposing Kafka consumer metrics (prometheus style), for instance. But there
are several other ways to get them.



On Sun, Nov 15, 2020, 7:09 AM Mazen Ezzeddine 
wrote:

> Thanks,
> My question is mostly about dynamic resource optimization,
>
>  say I configured my application with 30 partitions and then I managed for
> 30 consumers (within a consumer group) to read/process the produced
> messages, but say at for instance at some unpeak load, I realized that a
> single consumer (within the same consumer group) could do consume all
> messages produced (production rate is low)? are you aware of any Kafka
> solution that would automatically join/leave consumers from within a
> consumer group dynamically e.g., based on how well the consumer group is
> doing to keep up with the production rate?
>
> On 2020/11/14 22:36:34, Vinicius Scheidegger <
> vinicius.scheideg...@gmail.com> wrote:
> > This depends on the design of your application and how you are using
> Kafka.
> >
> > For instance, if you are using Kafka as a message queuing app, using
> > consumers within a consumer group to load balance, then you should create
> > the topics with as many partitions as the max number of consumers within
> > the same consumer group reading from it.
> >
> > For instance, in a regular load you believe you will have 8 consumers
> > reading from a topic, but at peak you believe you could have around 30
> > consumers. You should then have at least 30 partitions - each consumer
> in a
> > consumer group is assigned to a partition. Having more consumers in a
> > single consumer group than partitions are not effective as they won't be
> > receiving any messages.
> >
> > But as I mentioned earlier different designs may lead to different
> > decisions. Try to understand first how partitions work. How they are
> > assigned in your context (there are different partitioning schemes) and
> > then you may have a better sense of it.
> >
> > I hope it helps
> >
> > Vinicius Scheidegger
> >
> >
> >
> > On Sat, Nov 14, 2020, 8:39 PM Mazen Ezzeddine <
> > mazen.ezzedd...@etu.univ-cotedazur.fr> wrote:
> >
> > > Given a business application that resorts into a message queue solution
> > > like Kafka, what is the best number of partitions to select for a given
> > > topic? what influences such a decision?
> > >
> > >
> > > On the other hand, say we want to achieve a maximal throughput of
> message
> > > consumption but at minimal resource consumption? what is the best
> number of
> > > topic consumers to configure statically?
> > >
> > >
> > >
> > > If dynamic scale up/down of topic consumers is enabled what would
> better :
> > > (1) start with one consumer and scale up consumers until a desired
> metric
> > > is achieved, or (2) start with consumers equal to the number of
> partitions
> > > and then scale down until the desired metric is achieved?
> > >
> > >
> > > Are you aware of any cloud provider that offers a message broker
> service
> > > (namely, Kafka) that support automatic scale of consumers?
> > >
> > >
> > >
> > > Thank you.
> > >
> > >
> >
>


Re: On the best number of partitions per topic for kafka based application and the best number of consumers per topic?

2020-11-14 Thread Vinicius Scheidegger
This depends on the design of your application and how you are using Kafka.

For instance, if you are using Kafka as a message queuing app, using
consumers within a consumer group to load balance, then you should create
the topics with as many partitions as the max number of consumers within
the same consumer group reading from it.

For instance, in a regular load you believe you will have 8 consumers
reading from a topic, but at peak you believe you could have around 30
consumers. You should then have at least 30 partitions - each consumer in a
consumer group is assigned to a partition. Having more consumers in a
single consumer group than partitions are not effective as they won't be
receiving any messages.

But as I mentioned earlier different designs may lead to different
decisions. Try to understand first how partitions work. How they are
assigned in your context (there are different partitioning schemes) and
then you may have a better sense of it.

I hope it helps

Vinicius Scheidegger



On Sat, Nov 14, 2020, 8:39 PM Mazen Ezzeddine <
mazen.ezzedd...@etu.univ-cotedazur.fr> wrote:

> Given a business application that resorts into a message queue solution
> like Kafka, what is the best number of partitions to select for a given
> topic? what influences such a decision?
>
>
> On the other hand, say we want to achieve a maximal throughput of message
> consumption but at minimal resource consumption? what is the best number of
> topic consumers to configure statically?
>
>
>
> If dynamic scale up/down of topic consumers is enabled what would better :
> (1) start with one consumer and scale up consumers until a desired metric
> is achieved, or (2) start with consumers equal to the number of partitions
> and then scale down until the desired metric is achieved?
>
>
> Are you aware of any cloud provider that offers a message broker service
> (namely, Kafka) that support automatic scale of consumers?
>
>
>
> Thank you.
>
>


Re: Kafka topic partition distributing evenly on disks

2020-08-06 Thread Vinicius Scheidegger
Hi Peter,

AFAIK, everything depends on:

1) How you have configured your topic
  a) number of partitions (here I understand you have 15 partitions)
  b) partition replication configuration (each partition necessarily has a
leader - primary responsible to hold the data - and for reads and writes)
you can configure the topic to have a number of replicas
2) How you publish messages to the topic
  a) The publisher is responsible to choose the partition. This can be done
consciously (by setting the partition id while sending the message to the
topic) or unconsciously (by using the DefaultPartitioner or any other
partitioner scheme).

All messages sent to a specific partition will be written first to the
leader (meaning that the disk configured for the partition leader will
receive the load) and then replicated to the replica (followers).
Kafka does not automatically distribute the data equally to the different
brokers - you need to think about your architecture having that in mind.

I hope it helps

On Thu, Aug 6, 2020 at 10:23 PM Péter Nagykátai 
wrote:

> I initially started with one data disk (mounted solely to hold Kafka data)
> and recently added a new one.
>
> On Thu, Aug 6, 2020 at 10:13 PM  wrote:
>
> > What do you mean older disk ?
> >
> > On 8/6/20, 12:05 PM, "Péter Nagykátai"  wrote:
> >
> > [External]
> >
> >
> > Yeah, but it doesn't do that. My "older" disks have ~70 partitions,
> the
> > newer ones ~5 partitions. That's why I'm asking what went wrong.
> >
> > On Thu, Aug 6, 2020 at 8:35 PM  wrote:
> >
> > > Kafka  evenly distributed number of partition on each disk so in
> > your case
> > > every disk should have 3/2 topic partitions .
> > > It is producer job to evenly produce data by partition key  to
> topic
> > > partition .
> > > How it partition key , it is auto generated or producer sending key
> > along
> > > with message .
> > >
> > >
> > > On 8/6/20, 7:29 AM, "Péter Nagykátai" 
> wrote:
> > >
> > > [External]
> > >
> > >
> > > Hello,
> > >
> > > I have a Kafka cluster with 3 brokers (v2.3.0) and each broker
> > has 2
> > > disks
> > > attached. I added a new topic (heavyweight) and was surprised
> > that
> > > even if
> > > the topic has 15 partitions, those weren't distributed evenly
> on
> > the
> > > disks.
> > > Thus I got one disk that's almost empty and the other almost
> > filled
> > > up. Is
> > > there any way to have Kafka evenly distribute data on its
> disks?
> > >
> > > Thank you!
> > >
> > >
> > > This e-mail and any files transmitted with it are for the sole use
> > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > to the
> > > sender and destroy all copies of the original message. Any
> > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> by
> > > applicable law, this e-mail and other e-mail communications sent to
> > and
> > > from Cognizant e-mail addresses may be monitored.
> > > This e-mail and any files transmitted with it are for the sole use
> > of the
> > > intended recipient(s) and may contain confidential and privileged
> > > information. If you are not the intended recipient(s), please reply
> > to the
> > > sender and destroy all copies of the original message. Any
> > unauthorized
> > > review, use, disclosure, dissemination, forwarding, printing or
> > copying of
> > > this email, and/or any action taken in reliance on the contents of
> > this
> > > e-mail is strictly prohibited and may be unlawful. Where permitted
> by
> > > applicable law, this e-mail and other e-mail communications sent to
> > and
> > > from Cognizant e-mail addresses may be monitored.
> > >
> >
> >
> > This e-mail and any files transmitted with it are for the sole use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended recipient(s), please reply to
> the
> > sender and destroy all copies of the original message. Any unauthorized
> > review, use, disclosure, dissemination, forwarding, printing or copying
> of
> > this email, and/or any action taken in reliance on the contents of this
> > e-mail is strictly prohibited and may be unlawful. Where permitted by
> > applicable law, this e-mail and other e-mail communications sent to and
> > from Cognizant e-mail addresses may be monitored.
> > This e-mail and any files transmitted with it are for the sole use of the
> > intended recipient(s) and may contain confidential and privileged
> > information. If you are not the intended 

Re: Keys and partitions

2020-07-07 Thread Vinicius Scheidegger
Hi Victoria,

If processing order is not a requirement you could define a random key and
your load would be randomly distributed across partitions.
So far I was unable to find a solution to perfectly distribute the load
across partitions when records are created from multiple producers - random
distribution might be good enough though.

I hope it helps,

Vinicius Scheidegger


On Tue, Jul 7, 2020 at 7:52 AM Victoria Zuberman <
victoria.zuber...@imperva.com> wrote:

> Hi,
>
> I have userId as a key.
> Many users have moderate amounts of data but some users have more and some
> users have huge amount of data.
>
> I have been thinking about the following aspects of partitioning:
>
>   1.  If two or more large users will fall into same partition I might end
> up with large partition/s (unbalanced with other partitions)
>   2.  If smaller users fall in the same partition as a huge user the small
> users might get slower processing due to the amount of data the huge user
> has
>   3.  If the order of the messages is not critical, maybe I would want to
> allow several consumers to work on the data of the same huge user,
> therefore I would like to partition one userId into several partitions
>
> I have some ideas how to partition to solve those issues that but if you
> have something that worked well for you at production I would love to hear.
> Also, any links to relevant blogposts/etc will be welcome
>
> Thanks,
> Victoria
> ---
> NOTICE:
> This email and all attachments are confidential, may be proprietary, and
> may be privileged or otherwise protected from disclosure. They are intended
> solely for the individual or entity to whom the email is addressed.
> However, mistakes sometimes happen in addressing emails. If you believe
> that you are not an intended recipient, please stop reading immediately. Do
> not copy, forward, or rely on the contents in any way. Notify the sender
> and/or Imperva, Inc. by telephone at +1 (650) 832-6006 and then delete or
> destroy any copy of this email and its attachments. The sender reserves and
> asserts all rights to confidentiality, as well as any privileges that may
> apply. Any disclosure, copying, distribution or action taken or omitted to
> be taken by an unintended recipient in reliance on this message is
> prohibited and may be unlawful.
> Please consider the environment before printing this email.
>


Re: Broker side partition round robin

2020-06-02 Thread Vinicius Scheidegger
Hi Liam,

(+adding imgur links to the images)
First of all, thanks for checking my doubt.
I understand that the reason I notice this behavior is because our case
differs from yours in one point: I'm not writing terabytes in a single day.
I'm writing MB, but distribution matters in these MBs, this
because processing in the consumer side is CPU intensive (complex machine
learning algo), so a real equally distributed load is a requirement,
otherwise messages start queuing (which is what I don't want).

Our system is also in production and this bad distribution is generating
queuing in some consumers while others are idle.
I tried both approaches:
Round Robbin - actually gives me a distributed load in a big time window,
but as I grow the number of producers I get several messages in a single
partition while others don't get anything.
Hash of the key - here we used a random UUID as the key - still a bad
distribution


[image: image.png]
(imgur link: https://i.imgur.com/ZhQq9Uh.png <https://imgur.com/a/ha9OQMj>)


When I start, let's say 10 producers, we can get 5 messages going to one
partition while others have none - i understand that this is because the
round robin is internal to the producer.
[image: image.png]
  (imgur link: https://i.imgur.com/Hv8TUDL.png <https://imgur.com/a/ha9OQMj>
)
(The picture above is what I believe is happening)


it would surprise me that this hasn't come up before, that's why I'm pretty
sure I'm missing something here...
We're currently analyzing some solutions, one of them is building our own
partitioner with shared memory (yes, that's how far we got on this),
although I believe a better solution would be to have this on Kafka broker
side and not depend on custom code.

[image: image.png]
  (imgur link: https://i.imgur.com/JR8QvZH.png <https://imgur.com/a/ha9OQMj>
)

Above is the draft of one of our current ideas of a possible design. Based
on the shared memory we could decide the partition and send the messages
directly there (the number of producers, partitions and consumers here are
simplified).
This if we don't find a solution within Kafka to mitigate this distribution
issue - even though the thought design imposes limits and add layers we
didn't had in our initial design.

My question is, do we really need to develop this??
Is equal distribution in a scenario with multiple producers something that
can be achieved in Kafka without custom development?
Having never checked out the broker code when receiving a message - is
partition definition even possible on the broker side?
If this really does not exist, would a feature like that benefit other
people, is it worth checking out instead of the above solution?
Should I move this message to the dev forum? - no one gave me much
attention there too (but maybe my messages are too big/boring - hahaha)

Thanks


On Tue, Jun 2, 2020 at 1:08 PM Vinicius Scheidegger <
vinicius.scheideg...@gmail.com> wrote:

> Hi Liam,
>
> First of all, thanks for checking my doubt.
> I understand that the reason I notice this behavior is because our case
> differs from yours in one point: I'm not writing terabytes in a single day.
> I'm writing MB, but distribution matters in these MBs, this
> because processing in the consumer side is CPU intensive (complex machine
> learning algo), so a real equally distributed load is a requirement,
> otherwise messages start queuing (which is what I don't want).
>
> Our system is also in production and this bad distribution is generating
> queuing in some consumers while others are idle.
> I tried both approaches:
> Round Robbin - actually gives me a distributed load in a big time window,
> but as I grow the number of producers I get several messages in a single
> partition while others don't get anything.
> Hash of the key - here we used a random UUID as the key - still a bad
> distribution
>
>
> [image: image.png]
>
>
>
> When I start, let's say 10 producers, we can get 5 messages going to one
> partition while others have none - i understand that this is because the
> round robin is internal to the producer.
> [image: image.png]
>
> (The picture above is what I believe is happening)
>
>
> it would surprise me that this hasn't come up before, that's why I'm
> pretty sure I'm missing something here...
> We're currently analyzing some solutions, one of them is building our own
> partitioner with shared memory (yes, that's how far we got on this),
> although I believe a better solution would be to have this on Kafka broker
> side and not depend on custom code.
>
> [image: image.png]
> Above is the draft of one of our current ideas of a possible design. Based
> on the shared memory we could decide the partition and send the messages
> directly there (the number of producers, partitions and consumers here are
> simplified).
> This if we don't find a solution 

Re: Broker side partition round robin

2020-06-02 Thread Vinicius Scheidegger
Hi Liam,

First of all, thanks for checking my doubt.
I understand that the reason I notice this behavior is because our case
differs from yours in one point: I'm not writing terabytes in a single day.
I'm writing MB, but distribution matters in these MBs, this
because processing in the consumer side is CPU intensive (complex machine
learning algo), so a real equally distributed load is a requirement,
otherwise messages start queuing (which is what I don't want).

Our system is also in production and this bad distribution is generating
queuing in some consumers while others are idle.
I tried both approaches:
Round Robbin - actually gives me a distributed load in a big time window,
but as I grow the number of producers I get several messages in a single
partition while others don't get anything.
Hash of the key - here we used a random UUID as the key - still a bad
distribution


[image: image.png]



When I start, let's say 10 producers, we can get 5 messages going to one
partition while others have none - i understand that this is because the
round robin is internal to the producer.
[image: image.png]

(The picture above is what I believe is happening)


it would surprise me that this hasn't come up before, that's why I'm pretty
sure I'm missing something here...
We're currently analyzing some solutions, one of them is building our own
partitioner with shared memory (yes, that's how far we got on this),
although I believe a better solution would be to have this on Kafka broker
side and not depend on custom code.

[image: image.png]
Above is the draft of one of our current ideas of a possible design. Based
on the shared memory we could decide the partition and send the messages
directly there (the number of producers, partitions and consumers here are
simplified).
This if we don't find a solution within Kafka to mitigate this distribution
issue - even though the thought design imposes limits and add layers we
didn't had in our initial design.

My question is, do we really need to develop this??
Is equal distribution in a scenario with multiple producers something that
can be achieved in Kafka without custom development?
Having never checked out the broker code when receiving a message - is
partition definition even possible on the broker side?
If this really does not exist, would a feature like that benefit other
people, is it worth checking out instead of the above solution?
Should I move this message to the dev forum? - no one gave me much
attention there too (but maybe my messages are too big/boring - hahaha)

Thanks


On Tue, Jun 2, 2020 at 10:47 AM Liam Clarke-Hutchinson <
liam.cla...@adscale.co.nz> wrote:

> Hi Vinicius,
>
> As you note, the cluster doesn't load balance producers, it relies on them
> using a partition strategy to do so.
>
> In production, I've never had actual broker load skew develop from multiple
> independent producers using round robining - and we're talking say 20 - 50
> producers (depending on scaling) writing terabytes over a day.
>
> And load skew / hot brokers is something I monitor closely.
>
> The only time I've seen load skew is when a key based partition strategy
> was used, and keys weren't evenly distributed.
>
> So in other words, in theory there's no guarantee, but in my experience,
> round robining multiple producers works fine.
>
> Cheers,
>
> Liam Clarke
>
> On Mon, 1 Jun. 2020, 11:55 pm Vinicius Scheidegger, <
> vinicius.scheideg...@gmail.com> wrote:
>
> > Hey guys, I need some help here...
> >
> > Is this a flaw in the design (maybe a discussion point for a KIP?), is
> > Kafka not supposed to perform equal load balancing with multiple
> producers
> > or am I missing something (which is what I believe is happening)?
> >
> > On Wed, May 27, 2020 at 2:40 PM Vinicius Scheidegger <
> > vinicius.scheideg...@gmail.com> wrote:
> >
> >> Does anyone know whether we could really have an "out of the box"
> >> solution to do round robin over the partitions when we have multiple
> >> producers?
> >> By that I mean, a round robin on the broker side (or maybe some way to
> >> synchronize all producers).
> >>
> >> Thank you,
> >>
> >> On Tue, May 26, 2020 at 1:41 PM Vinicius Scheidegger <
> >> vinicius.scheideg...@gmail.com> wrote:
> >>
> >>> Yes, I checked it. The issue is that RoundRobbinPartitioner is bound to
> >>> the producer. In a scenario with multiple producers it doesn't
> guarantee
> >>> equal distribution - from what I understood and from my tests, the
> >>> following situation happens with it:
> >>>
> >>> [image: image.png]
> >>>
> >>> Of course, the first partition is not always 1 and each producer may

Re: Broker side partition round robin

2020-06-01 Thread Vinicius Scheidegger
Hey guys, I need some help here...

Is this a flaw in the design (maybe a discussion point for a KIP?), is
Kafka not supposed to perform equal load balancing with multiple producers
or am I missing something (which is what I believe is happening)?

On Wed, May 27, 2020 at 2:40 PM Vinicius Scheidegger <
vinicius.scheideg...@gmail.com> wrote:

> Does anyone know whether we could really have an "out of the box" solution
> to do round robin over the partitions when we have multiple producers?
> By that I mean, a round robin on the broker side (or maybe some way to
> synchronize all producers).
>
> Thank you,
>
> On Tue, May 26, 2020 at 1:41 PM Vinicius Scheidegger <
> vinicius.scheideg...@gmail.com> wrote:
>
>> Yes, I checked it. The issue is that RoundRobbinPartitioner is bound to
>> the producer. In a scenario with multiple producers it doesn't guarantee
>> equal distribution - from what I understood and from my tests, the
>> following situation happens with it:
>>
>> [image: image.png]
>>
>> Of course, the first partition is not always 1 and each producer may
>> start in a different point in time, anyway my point is that it does not
>> guarantee equal distribution.
>>
>> The other option pointed out is to select the partition myself - either a
>> shared memory on the producers (assuming that this is possible - I mean I
>> would need to guarantee that producers CAN share a synchronized memory) or
>> include an intermediate topic with a single partition and a
>> dispatcher/producer using RoundRobinPartitioner (but this would include a
>> single point of failure).
>>
>> [image: image.png]
>> [image: image.png]
>>
>> None of these seem to be ideal as a Broker side round robin solution
>> would.
>> Am I missing something? Any other ideas?
>>
>> Thanks
>>
>> On Tue, May 26, 2020 at 11:34 AM M. Manna  wrote:
>>
>>> Hey Vinicius,
>>>
>>>
>>> On Tue, 26 May 2020 at 10:27, Vinicius Scheidegger <
>>> vinicius.scheideg...@gmail.com> wrote:
>>>
>>> > In a scenario with multiple independent producers (imagine ephemeral
>>> > dockers, that do not know the state of each other), what should be the
>>> > approach for the messages being sent to be equally distributed over a
>>> topic
>>> > partition?
>>> >
>>> > From what I understood the partition election is always on the
>>> Producer. Is
>>> > this understanding correct?
>>> >
>>> > If that's the case, how should one achieve an equally distributed load
>>> > balancing (round robin) over the partitions in a scenario with multiple
>>> > producers?
>>> >
>>> > Thank you,
>>> >
>>> > Vinicius Scheidegger
>>>
>>>
>>>  Have you checked RoundRobinPartitioner ? Also, you can always specify
>>> which partition you are writing to, so you can control the partitioning
>>> in
>>> your way.
>>>
>>> Regards,
>>>
>>> Regards,
>>>
>>> >
>>> >
>>>
>>


Re: Broker side partition round robin

2020-05-27 Thread Vinicius Scheidegger
Does anyone know whether we could really have an "out of the box" solution
to do round robin over the partitions when we have multiple producers?
By that I mean, a round robin on the broker side (or maybe some way to
synchronize all producers).

Thank you,

On Tue, May 26, 2020 at 1:41 PM Vinicius Scheidegger <
vinicius.scheideg...@gmail.com> wrote:

> Yes, I checked it. The issue is that RoundRobbinPartitioner is bound to
> the producer. In a scenario with multiple producers it doesn't guarantee
> equal distribution - from what I understood and from my tests, the
> following situation happens with it:
>
> [image: image.png]
>
> Of course, the first partition is not always 1 and each producer may start
> in a different point in time, anyway my point is that it does not guarantee
> equal distribution.
>
> The other option pointed out is to select the partition myself - either a
> shared memory on the producers (assuming that this is possible - I mean I
> would need to guarantee that producers CAN share a synchronized memory) or
> include an intermediate topic with a single partition and a
> dispatcher/producer using RoundRobinPartitioner (but this would include a
> single point of failure).
>
> [image: image.png]
> [image: image.png]
>
> None of these seem to be ideal as a Broker side round robin solution would.
> Am I missing something? Any other ideas?
>
> Thanks
>
> On Tue, May 26, 2020 at 11:34 AM M. Manna  wrote:
>
>> Hey Vinicius,
>>
>>
>> On Tue, 26 May 2020 at 10:27, Vinicius Scheidegger <
>> vinicius.scheideg...@gmail.com> wrote:
>>
>> > In a scenario with multiple independent producers (imagine ephemeral
>> > dockers, that do not know the state of each other), what should be the
>> > approach for the messages being sent to be equally distributed over a
>> topic
>> > partition?
>> >
>> > From what I understood the partition election is always on the
>> Producer. Is
>> > this understanding correct?
>> >
>> > If that's the case, how should one achieve an equally distributed load
>> > balancing (round robin) over the partitions in a scenario with multiple
>> > producers?
>> >
>> > Thank you,
>> >
>> > Vinicius Scheidegger
>>
>>
>>  Have you checked RoundRobinPartitioner ? Also, you can always specify
>> which partition you are writing to, so you can control the partitioning in
>> your way.
>>
>> Regards,
>>
>> Regards,
>>
>> >
>> >
>>
>


Re: Broker side partition round robin

2020-05-26 Thread Vinicius Scheidegger
Yes, I checked it. The issue is that RoundRobbinPartitioner is bound to the
producer. In a scenario with multiple producers it doesn't guarantee equal
distribution - from what I understood and from my tests, the following
situation happens with it:

[image: image.png]

Of course, the first partition is not always 1 and each producer may start
in a different point in time, anyway my point is that it does not guarantee
equal distribution.

The other option pointed out is to select the partition myself - either a
shared memory on the producers (assuming that this is possible - I mean I
would need to guarantee that producers CAN share a synchronized memory) or
include an intermediate topic with a single partition and a
dispatcher/producer using RoundRobinPartitioner (but this would include a
single point of failure).

[image: image.png]
[image: image.png]

None of these seem to be ideal as a Broker side round robin solution would.
Am I missing something? Any other ideas?

Thanks

On Tue, May 26, 2020 at 11:34 AM M. Manna  wrote:

> Hey Vinicius,
>
>
> On Tue, 26 May 2020 at 10:27, Vinicius Scheidegger <
> vinicius.scheideg...@gmail.com> wrote:
>
> > In a scenario with multiple independent producers (imagine ephemeral
> > dockers, that do not know the state of each other), what should be the
> > approach for the messages being sent to be equally distributed over a
> topic
> > partition?
> >
> > From what I understood the partition election is always on the Producer.
> Is
> > this understanding correct?
> >
> > If that's the case, how should one achieve an equally distributed load
> > balancing (round robin) over the partitions in a scenario with multiple
> > producers?
> >
> > Thank you,
> >
> > Vinicius Scheidegger
>
>
>  Have you checked RoundRobinPartitioner ? Also, you can always specify
> which partition you are writing to, so you can control the partitioning in
> your way.
>
> Regards,
>
> Regards,
>
> >
> >
>


Broker side partition round robin

2020-05-26 Thread Vinicius Scheidegger
In a scenario with multiple independent producers (imagine ephemeral
dockers, that do not know the state of each other), what should be the
approach for the messages being sent to be equally distributed over a topic
partition?

>From what I understood the partition election is always on the Producer. Is
this understanding correct?

If that's the case, how should one achieve an equally distributed load
balancing (round robin) over the partitions in a scenario with multiple
producers?

Thank you,

Vinicius Scheidegger