Re: 'roundrobin' partition assignment strategy restrictions

2015-05-05 Thread Jason Rosenberg
I filed this jira, fwiw:  https://issues.apache.org/jira/browse/KAFKA-2172

Jason

On Mon, Mar 23, 2015 at 2:44 PM, Jiangjie Qin 
wrote:

> Hi Jason,
>
> Yes, I agree the restriction makes the usage of round-robin less flexible.
> I think the focus of round-robin strategy is workload balance. If
> different consumers are consuming from different topics, it is unbalanced
> by nature. In that case, is it possible that you use different consumer
> group for different sets of topics?
> The rolling update is a good point. If you do rolling bounce in a small
> window, the rebalance retry should handle it. But if you want to canary a
> new topic setting on one consumer for some time, it won’t work.
> Could you maybe share the use case with more detail? So we can see if
> there is any workaround.
>
> Jiangjie (Becket) Qin
>
> On 3/22/15, 10:04 AM, "Jason Rosenberg"  wrote:
>
> >Jiangjie,
> >
> >Yeah, I welcome the round-robin strategy, as the 'range' strategy ('til
> >now
> >the only one available), is not always good at balancing partitions, as
> >you
> >observed above.
> >
> >The main thing I'm bringing up in this thread though is the question of
> >why
> >there needs to be a restriction to having a homogenous set of consumers in
> >the group being balanced.  This is not a requirement for the range
> >algorithm, but is for the roundrobin algorithm.  So, I'm just wanting to
> >understand why there's that limitation.  (And sadly, in our case, we do
> >have heterogenous consumers using the same groupid, so we can't easily
> >turn
> >on roundrobin at the moment, without some effort :) ).
> >
> >I can see that it does simplify the implementation to have that
> >limitation,
> >but I'm just wondering if there's anything fundamental that would prevent
> >an implementation that works over heterogenous consumers.  E.g. "Lay out
> >all partitions, and layout all consumer threads, and proceed round robin
> >assigning each partition to the next consumer thread. *If the next
> >consumer
> >thread doesn't have a selection for the current partition, then move on to
> >the next consumer-thread"*
> >
> >The current implementation is also problematic if you are doing a rolling
> >restart of a consumer cluster.  Let's say you are updating the topic
> >selection as part of an update to the cluster.  Once the first node is
> >updated, the entire cluster will no longer be homogenous until the last
> >node is updated, which means you will have a temporary outage consuming
> >data until all nodes have been updated.  So, it makes it difficult to do
> >rolling restarts, or canary updates on a subset of nodes, etc.
> >
> >Jason
> >
> >Jason
> >
> >On Fri, Mar 20, 2015 at 10:15 PM, Jiangjie Qin  >
> >wrote:
> >
> >> Hi Jason,
> >>
> >> The motivation behind round robin is to better balance the consumers¹
> >> load. Imagine you have two topics each with two partitions. These topics
> >> are consumed by two consumers each with two consumer threads.
> >>
> >> The range assignment gives:
> >> T1-P1 -> C1-Thr1
> >> T1-P2 -> C1-Thr2
> >> T2-P1 -> C1-Thr1
> >> T2-P2 -> C1-Thr2
> >> Consumer 2 will not be consuming from any partitions.
> >>
> >> The round robin algorithm gives:
> >> T1-P1 -> C1-Thr1
> >> T1-P2 -> C1-Thr2
> >> T2-P1 -> C2-Thr1
> >> T2-p2 -> C2-Thr2
> >> It is much better than range assignment.
> >>
> >> That¹s the reason why we introduced round robin strategy even though it
> >> has restrictions.
> >>
> >> Jiangjie (Becket) Qin
> >>
> >>
> >> On 3/20/15, 12:20 PM, "Jason Rosenberg"  wrote:
> >>
> >> >Jiangle,
> >> >
> >> >The error messages I got (and the config doc) do clearly state that the
> >> >number of threads per consumer must match also
> >> >
> >> >I'm not convinced that an easy to understand algorithm would work fine
> >> >with
> >> >a heterogeneous set of selected topics between consumers.
> >> >
> >> >Jason
> >> >
> >> >On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat
> >> > >> >> wrote:
> >> >
> >> >> Hi Becket,
> >> >>
> >> >> Can you list down an example for this. It would be easier to
> >>understand
> >> >>:)
> >> >>
> >> >> Thanks,
> >> >>
> >> >> Mayuresh
> >> >>
> >> >> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin
> >> >>
> >> >> wrote:
> >> >>
> >> >> > Hi Jason,
> >> >> >
> >> >> > The round-robin strategy first takes the partitions of all the
> >>topics
> >> >>a
> >> >> > consumer is consuming from, then distributed them across all the
> >> >> consumers.
> >> >> > If different consumers are consuming from different topics, the
> >> >>assigning
> >> >> > algorithm will generate different answers on different consumers.
> >> >> > It is OK for consumers to have different thread count, but the
> >> >>consumers
> >> >> > have to consume from the same set of topics.
> >> >> >
> >> >> >
> >> >> > For range strategy, the balance is for each individual topic
> >>instead
> >> >>of
> >> >> > cross topics. So the balance is only done for the consumers
> >>consuming
> >> >> from
> >> >> > the same topic.
> >> >> >

Re: 'roundrobin' partition assignment strategy restrictions

2015-03-23 Thread Jiangjie Qin
Hi Jason,

Yes, I agree the restriction makes the usage of round-robin less flexible.
I think the focus of round-robin strategy is workload balance. If
different consumers are consuming from different topics, it is unbalanced
by nature. In that case, is it possible that you use different consumer
group for different sets of topics?
The rolling update is a good point. If you do rolling bounce in a small
window, the rebalance retry should handle it. But if you want to canary a
new topic setting on one consumer for some time, it won’t work.
Could you maybe share the use case with more detail? So we can see if
there is any workaround.

Jiangjie (Becket) Qin

On 3/22/15, 10:04 AM, "Jason Rosenberg"  wrote:

>Jiangjie,
>
>Yeah, I welcome the round-robin strategy, as the 'range' strategy ('til
>now
>the only one available), is not always good at balancing partitions, as
>you
>observed above.
>
>The main thing I'm bringing up in this thread though is the question of
>why
>there needs to be a restriction to having a homogenous set of consumers in
>the group being balanced.  This is not a requirement for the range
>algorithm, but is for the roundrobin algorithm.  So, I'm just wanting to
>understand why there's that limitation.  (And sadly, in our case, we do
>have heterogenous consumers using the same groupid, so we can't easily
>turn
>on roundrobin at the moment, without some effort :) ).
>
>I can see that it does simplify the implementation to have that
>limitation,
>but I'm just wondering if there's anything fundamental that would prevent
>an implementation that works over heterogenous consumers.  E.g. "Lay out
>all partitions, and layout all consumer threads, and proceed round robin
>assigning each partition to the next consumer thread. *If the next
>consumer
>thread doesn't have a selection for the current partition, then move on to
>the next consumer-thread"*
>
>The current implementation is also problematic if you are doing a rolling
>restart of a consumer cluster.  Let's say you are updating the topic
>selection as part of an update to the cluster.  Once the first node is
>updated, the entire cluster will no longer be homogenous until the last
>node is updated, which means you will have a temporary outage consuming
>data until all nodes have been updated.  So, it makes it difficult to do
>rolling restarts, or canary updates on a subset of nodes, etc.
>
>Jason
>
>Jason
>
>On Fri, Mar 20, 2015 at 10:15 PM, Jiangjie Qin 
>wrote:
>
>> Hi Jason,
>>
>> The motivation behind round robin is to better balance the consumers¹
>> load. Imagine you have two topics each with two partitions. These topics
>> are consumed by two consumers each with two consumer threads.
>>
>> The range assignment gives:
>> T1-P1 -> C1-Thr1
>> T1-P2 -> C1-Thr2
>> T2-P1 -> C1-Thr1
>> T2-P2 -> C1-Thr2
>> Consumer 2 will not be consuming from any partitions.
>>
>> The round robin algorithm gives:
>> T1-P1 -> C1-Thr1
>> T1-P2 -> C1-Thr2
>> T2-P1 -> C2-Thr1
>> T2-p2 -> C2-Thr2
>> It is much better than range assignment.
>>
>> That¹s the reason why we introduced round robin strategy even though it
>> has restrictions.
>>
>> Jiangjie (Becket) Qin
>>
>>
>> On 3/20/15, 12:20 PM, "Jason Rosenberg"  wrote:
>>
>> >Jiangle,
>> >
>> >The error messages I got (and the config doc) do clearly state that the
>> >number of threads per consumer must match also
>> >
>> >I'm not convinced that an easy to understand algorithm would work fine
>> >with
>> >a heterogeneous set of selected topics between consumers.
>> >
>> >Jason
>> >
>> >On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat
>> >> >> wrote:
>> >
>> >> Hi Becket,
>> >>
>> >> Can you list down an example for this. It would be easier to
>>understand
>> >>:)
>> >>
>> >> Thanks,
>> >>
>> >> Mayuresh
>> >>
>> >> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin
>> >>
>> >> wrote:
>> >>
>> >> > Hi Jason,
>> >> >
>> >> > The round-robin strategy first takes the partitions of all the
>>topics
>> >>a
>> >> > consumer is consuming from, then distributed them across all the
>> >> consumers.
>> >> > If different consumers are consuming from different topics, the
>> >>assigning
>> >> > algorithm will generate different answers on different consumers.
>> >> > It is OK for consumers to have different thread count, but the
>> >>consumers
>> >> > have to consume from the same set of topics.
>> >> >
>> >> >
>> >> > For range strategy, the balance is for each individual topic
>>instead
>> >>of
>> >> > cross topics. So the balance is only done for the consumers
>>consuming
>> >> from
>> >> > the same topic.
>> >> >
>> >> > Thanks.
>> >> >
>> >> > Jiangjie (Becket) Qin
>> >> >
>> >> > On 3/19/15, 4:14 PM, "Jason Rosenberg"  wrote:
>> >> >
>> >> > >So,
>> >> > >
>> >> > >I've run into an issue migrating a consumer to use the new
>> >>'roundrobin'
>> >> > >partition.assignment.strategy.  It turns out that several of our
>> >> consumers
>> >> > >use the same group id, but instantiate several different consumer
>> >> > >instance

Re: 'roundrobin' partition assignment strategy restrictions

2015-03-22 Thread Jason Rosenberg
Jiangjie,

Yeah, I welcome the round-robin strategy, as the 'range' strategy ('til now
the only one available), is not always good at balancing partitions, as you
observed above.

The main thing I'm bringing up in this thread though is the question of why
there needs to be a restriction to having a homogenous set of consumers in
the group being balanced.  This is not a requirement for the range
algorithm, but is for the roundrobin algorithm.  So, I'm just wanting to
understand why there's that limitation.  (And sadly, in our case, we do
have heterogenous consumers using the same groupid, so we can't easily turn
on roundrobin at the moment, without some effort :) ).

I can see that it does simplify the implementation to have that limitation,
but I'm just wondering if there's anything fundamental that would prevent
an implementation that works over heterogenous consumers.  E.g. "Lay out
all partitions, and layout all consumer threads, and proceed round robin
assigning each partition to the next consumer thread. *If the next consumer
thread doesn't have a selection for the current partition, then move on to
the next consumer-thread"*

The current implementation is also problematic if you are doing a rolling
restart of a consumer cluster.  Let's say you are updating the topic
selection as part of an update to the cluster.  Once the first node is
updated, the entire cluster will no longer be homogenous until the last
node is updated, which means you will have a temporary outage consuming
data until all nodes have been updated.  So, it makes it difficult to do
rolling restarts, or canary updates on a subset of nodes, etc.

Jason

Jason

On Fri, Mar 20, 2015 at 10:15 PM, Jiangjie Qin 
wrote:

> Hi Jason,
>
> The motivation behind round robin is to better balance the consumers¹
> load. Imagine you have two topics each with two partitions. These topics
> are consumed by two consumers each with two consumer threads.
>
> The range assignment gives:
> T1-P1 -> C1-Thr1
> T1-P2 -> C1-Thr2
> T2-P1 -> C1-Thr1
> T2-P2 -> C1-Thr2
> Consumer 2 will not be consuming from any partitions.
>
> The round robin algorithm gives:
> T1-P1 -> C1-Thr1
> T1-P2 -> C1-Thr2
> T2-P1 -> C2-Thr1
> T2-p2 -> C2-Thr2
> It is much better than range assignment.
>
> That¹s the reason why we introduced round robin strategy even though it
> has restrictions.
>
> Jiangjie (Becket) Qin
>
>
> On 3/20/15, 12:20 PM, "Jason Rosenberg"  wrote:
>
> >Jiangle,
> >
> >The error messages I got (and the config doc) do clearly state that the
> >number of threads per consumer must match also
> >
> >I'm not convinced that an easy to understand algorithm would work fine
> >with
> >a heterogeneous set of selected topics between consumers.
> >
> >Jason
> >
> >On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat
> > >> wrote:
> >
> >> Hi Becket,
> >>
> >> Can you list down an example for this. It would be easier to understand
> >>:)
> >>
> >> Thanks,
> >>
> >> Mayuresh
> >>
> >> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin
> >>
> >> wrote:
> >>
> >> > Hi Jason,
> >> >
> >> > The round-robin strategy first takes the partitions of all the topics
> >>a
> >> > consumer is consuming from, then distributed them across all the
> >> consumers.
> >> > If different consumers are consuming from different topics, the
> >>assigning
> >> > algorithm will generate different answers on different consumers.
> >> > It is OK for consumers to have different thread count, but the
> >>consumers
> >> > have to consume from the same set of topics.
> >> >
> >> >
> >> > For range strategy, the balance is for each individual topic instead
> >>of
> >> > cross topics. So the balance is only done for the consumers consuming
> >> from
> >> > the same topic.
> >> >
> >> > Thanks.
> >> >
> >> > Jiangjie (Becket) Qin
> >> >
> >> > On 3/19/15, 4:14 PM, "Jason Rosenberg"  wrote:
> >> >
> >> > >So,
> >> > >
> >> > >I've run into an issue migrating a consumer to use the new
> >>'roundrobin'
> >> > >partition.assignment.strategy.  It turns out that several of our
> >> consumers
> >> > >use the same group id, but instantiate several different consumer
> >> > >instances
> >> > >(with different topic selectors and thread counts).  Often, this is
> >>done
> >> > >in
> >> > >a single shared process.  It turns out this arrangement is not
> >>allowed
> >> > >when
> >> > >using the 'roundrobin' assignment strategy.
> >> > >
> >> > >I'm curious as to the reason for this restriction?  Why is it not
> >>also a
> >> > >restriction for the 'range' strategy (which we've been happily using
> >>for
> >> > >some time now)?
> >> > >
> >> > >It would seem that as long as you always assign a partition to a
> >> consumer
> >> > >instance that is actually selecting it, you should still be able to
> >> > >proceed
> >> > >with the round-robin algorithm (potentially skipping consumers if
> >>they
> >> > >can't select the next partition in the list, etc.).
> >> > >
> >> > >Jason
> >> >
> >> >
> >>
> >>
> >> --
> >> -Regards,
> >

Re: 'roundrobin' partition assignment strategy restrictions

2015-03-21 Thread gharatmayuresh15
I am not sure if that's how it works. 

I suppose each consumer should be able to consume from all the topics right. If 
not then it looks weird.

Thanks,

Mayuresh

Sent from my iPhone

> On Mar 20, 2015, at 7:15 PM, Jiangjie Qin  wrote:
> 
> Hi Jason, 
> 
> The motivation behind round robin is to better balance the consumers¹
> load. Imagine you have two topics each with two partitions. These topics
> are consumed by two consumers each with two consumer threads.
> 
> The range assignment gives:
> T1-P1 -> C1-Thr1
> T1-P2 -> C1-Thr2
> T2-P1 -> C1-Thr1
> T2-P2 -> C1-Thr2
> Consumer 2 will not be consuming from any partitions.
> 
> The round robin algorithm gives:
> T1-P1 -> C1-Thr1
> T1-P2 -> C1-Thr2
> T2-P1 -> C2-Thr1
> T2-p2 -> C2-Thr2
> It is much better than range assignment.
> 
> That¹s the reason why we introduced round robin strategy even though it
> has restrictions.
> 
> Jiangjie (Becket) Qin
> 
> 
>> On 3/20/15, 12:20 PM, "Jason Rosenberg"  wrote:
>> 
>> Jiangle,
>> 
>> The error messages I got (and the config doc) do clearly state that the
>> number of threads per consumer must match also
>> 
>> I'm not convinced that an easy to understand algorithm would work fine
>> with
>> a heterogeneous set of selected topics between consumers.
>> 
>> Jason
>> 
>> On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat
>> >> wrote:
>> 
>>> Hi Becket,
>>> 
>>> Can you list down an example for this. It would be easier to understand
>>> :)
>>> 
>>> Thanks,
>>> 
>>> Mayuresh
>>> 
>>> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin
>>> 
>>> wrote:
>>> 
 Hi Jason,
 
 The round-robin strategy first takes the partitions of all the topics
>>> a
 consumer is consuming from, then distributed them across all the
>>> consumers.
 If different consumers are consuming from different topics, the
>>> assigning
 algorithm will generate different answers on different consumers.
 It is OK for consumers to have different thread count, but the
>>> consumers
 have to consume from the same set of topics.
 
 
 For range strategy, the balance is for each individual topic instead
>>> of
 cross topics. So the balance is only done for the consumers consuming
>>> from
 the same topic.
 
 Thanks.
 
 Jiangjie (Becket) Qin
 
> On 3/19/15, 4:14 PM, "Jason Rosenberg"  wrote:
> 
> So,
> 
> I've run into an issue migrating a consumer to use the new
>>> 'roundrobin'
> partition.assignment.strategy.  It turns out that several of our
>>> consumers
> use the same group id, but instantiate several different consumer
> instances
> (with different topic selectors and thread counts).  Often, this is
>>> done
> in
> a single shared process.  It turns out this arrangement is not
>>> allowed
> when
> using the 'roundrobin' assignment strategy.
> 
> I'm curious as to the reason for this restriction?  Why is it not
>>> also a
> restriction for the 'range' strategy (which we've been happily using
>>> for
> some time now)?
> 
> It would seem that as long as you always assign a partition to a
>>> consumer
> instance that is actually selecting it, you should still be able to
> proceed
> with the round-robin algorithm (potentially skipping consumers if
>>> they
> can't select the next partition in the list, etc.).
> 
> Jason
 
 
>>> 
>>> 
>>> --
>>> -Regards,
>>> Mayuresh R. Gharat
>>> (862) 250-7125
>>> 
> 


Re: 'roundrobin' partition assignment strategy restrictions

2015-03-20 Thread Jiangjie Qin
Hi Jason, 

The motivation behind round robin is to better balance the consumers¹
load. Imagine you have two topics each with two partitions. These topics
are consumed by two consumers each with two consumer threads.

The range assignment gives:
T1-P1 -> C1-Thr1
T1-P2 -> C1-Thr2
T2-P1 -> C1-Thr1
T2-P2 -> C1-Thr2
Consumer 2 will not be consuming from any partitions.

The round robin algorithm gives:
T1-P1 -> C1-Thr1
T1-P2 -> C1-Thr2
T2-P1 -> C2-Thr1
T2-p2 -> C2-Thr2
It is much better than range assignment.

That¹s the reason why we introduced round robin strategy even though it
has restrictions.

Jiangjie (Becket) Qin


On 3/20/15, 12:20 PM, "Jason Rosenberg"  wrote:

>Jiangle,
>
>The error messages I got (and the config doc) do clearly state that the
>number of threads per consumer must match also
>
>I'm not convinced that an easy to understand algorithm would work fine
>with
>a heterogeneous set of selected topics between consumers.
>
>Jason
>
>On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat
>> wrote:
>
>> Hi Becket,
>>
>> Can you list down an example for this. It would be easier to understand
>>:)
>>
>> Thanks,
>>
>> Mayuresh
>>
>> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin
>>
>> wrote:
>>
>> > Hi Jason,
>> >
>> > The round-robin strategy first takes the partitions of all the topics
>>a
>> > consumer is consuming from, then distributed them across all the
>> consumers.
>> > If different consumers are consuming from different topics, the
>>assigning
>> > algorithm will generate different answers on different consumers.
>> > It is OK for consumers to have different thread count, but the
>>consumers
>> > have to consume from the same set of topics.
>> >
>> >
>> > For range strategy, the balance is for each individual topic instead
>>of
>> > cross topics. So the balance is only done for the consumers consuming
>> from
>> > the same topic.
>> >
>> > Thanks.
>> >
>> > Jiangjie (Becket) Qin
>> >
>> > On 3/19/15, 4:14 PM, "Jason Rosenberg"  wrote:
>> >
>> > >So,
>> > >
>> > >I've run into an issue migrating a consumer to use the new
>>'roundrobin'
>> > >partition.assignment.strategy.  It turns out that several of our
>> consumers
>> > >use the same group id, but instantiate several different consumer
>> > >instances
>> > >(with different topic selectors and thread counts).  Often, this is
>>done
>> > >in
>> > >a single shared process.  It turns out this arrangement is not
>>allowed
>> > >when
>> > >using the 'roundrobin' assignment strategy.
>> > >
>> > >I'm curious as to the reason for this restriction?  Why is it not
>>also a
>> > >restriction for the 'range' strategy (which we've been happily using
>>for
>> > >some time now)?
>> > >
>> > >It would seem that as long as you always assign a partition to a
>> consumer
>> > >instance that is actually selecting it, you should still be able to
>> > >proceed
>> > >with the round-robin algorithm (potentially skipping consumers if
>>they
>> > >can't select the next partition in the list, etc.).
>> > >
>> > >Jason
>> >
>> >
>>
>>
>> --
>> -Regards,
>> Mayuresh R. Gharat
>> (862) 250-7125
>>



Re: 'roundrobin' partition assignment strategy restrictions

2015-03-20 Thread Jason Rosenberg
Jiangle,

The error messages I got (and the config doc) do clearly state that the
number of threads per consumer must match also

I'm not convinced that an easy to understand algorithm would work fine with
a heterogeneous set of selected topics between consumers.

Jason

On Thu, Mar 19, 2015 at 8:07 PM, Mayuresh Gharat  wrote:

> Hi Becket,
>
> Can you list down an example for this. It would be easier to understand :)
>
> Thanks,
>
> Mayuresh
>
> On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin 
> wrote:
>
> > Hi Jason,
> >
> > The round-robin strategy first takes the partitions of all the topics a
> > consumer is consuming from, then distributed them across all the
> consumers.
> > If different consumers are consuming from different topics, the assigning
> > algorithm will generate different answers on different consumers.
> > It is OK for consumers to have different thread count, but the consumers
> > have to consume from the same set of topics.
> >
> >
> > For range strategy, the balance is for each individual topic instead of
> > cross topics. So the balance is only done for the consumers consuming
> from
> > the same topic.
> >
> > Thanks.
> >
> > Jiangjie (Becket) Qin
> >
> > On 3/19/15, 4:14 PM, "Jason Rosenberg"  wrote:
> >
> > >So,
> > >
> > >I've run into an issue migrating a consumer to use the new 'roundrobin'
> > >partition.assignment.strategy.  It turns out that several of our
> consumers
> > >use the same group id, but instantiate several different consumer
> > >instances
> > >(with different topic selectors and thread counts).  Often, this is done
> > >in
> > >a single shared process.  It turns out this arrangement is not allowed
> > >when
> > >using the 'roundrobin' assignment strategy.
> > >
> > >I'm curious as to the reason for this restriction?  Why is it not also a
> > >restriction for the 'range' strategy (which we've been happily using for
> > >some time now)?
> > >
> > >It would seem that as long as you always assign a partition to a
> consumer
> > >instance that is actually selecting it, you should still be able to
> > >proceed
> > >with the round-robin algorithm (potentially skipping consumers if they
> > >can't select the next partition in the list, etc.).
> > >
> > >Jason
> >
> >
>
>
> --
> -Regards,
> Mayuresh R. Gharat
> (862) 250-7125
>


Re: 'roundrobin' partition assignment strategy restrictions

2015-03-19 Thread Mayuresh Gharat
Hi Becket,

Can you list down an example for this. It would be easier to understand :)

Thanks,

Mayuresh

On Thu, Mar 19, 2015 at 4:46 PM, Jiangjie Qin 
wrote:

> Hi Jason,
>
> The round-robin strategy first takes the partitions of all the topics a
> consumer is consuming from, then distributed them across all the consumers.
> If different consumers are consuming from different topics, the assigning
> algorithm will generate different answers on different consumers.
> It is OK for consumers to have different thread count, but the consumers
> have to consume from the same set of topics.
>
>
> For range strategy, the balance is for each individual topic instead of
> cross topics. So the balance is only done for the consumers consuming from
> the same topic.
>
> Thanks.
>
> Jiangjie (Becket) Qin
>
> On 3/19/15, 4:14 PM, "Jason Rosenberg"  wrote:
>
> >So,
> >
> >I've run into an issue migrating a consumer to use the new 'roundrobin'
> >partition.assignment.strategy.  It turns out that several of our consumers
> >use the same group id, but instantiate several different consumer
> >instances
> >(with different topic selectors and thread counts).  Often, this is done
> >in
> >a single shared process.  It turns out this arrangement is not allowed
> >when
> >using the 'roundrobin' assignment strategy.
> >
> >I'm curious as to the reason for this restriction?  Why is it not also a
> >restriction for the 'range' strategy (which we've been happily using for
> >some time now)?
> >
> >It would seem that as long as you always assign a partition to a consumer
> >instance that is actually selecting it, you should still be able to
> >proceed
> >with the round-robin algorithm (potentially skipping consumers if they
> >can't select the next partition in the list, etc.).
> >
> >Jason
>
>


-- 
-Regards,
Mayuresh R. Gharat
(862) 250-7125


Re: 'roundrobin' partition assignment strategy restrictions

2015-03-19 Thread Jiangjie Qin
Hi Jason,

The round-robin strategy first takes the partitions of all the topics a
consumer is consuming from, then distributed them across all the consumers.
If different consumers are consuming from different topics, the assigning
algorithm will generate different answers on different consumers.
It is OK for consumers to have different thread count, but the consumers
have to consume from the same set of topics.


For range strategy, the balance is for each individual topic instead of
cross topics. So the balance is only done for the consumers consuming from
the same topic.

Thanks.

Jiangjie (Becket) Qin

On 3/19/15, 4:14 PM, "Jason Rosenberg"  wrote:

>So,
>
>I've run into an issue migrating a consumer to use the new 'roundrobin'
>partition.assignment.strategy.  It turns out that several of our consumers
>use the same group id, but instantiate several different consumer
>instances
>(with different topic selectors and thread counts).  Often, this is done
>in
>a single shared process.  It turns out this arrangement is not allowed
>when
>using the 'roundrobin' assignment strategy.
>
>I'm curious as to the reason for this restriction?  Why is it not also a
>restriction for the 'range' strategy (which we've been happily using for
>some time now)?
>
>It would seem that as long as you always assign a partition to a consumer
>instance that is actually selecting it, you should still be able to
>proceed
>with the round-robin algorithm (potentially skipping consumers if they
>can't select the next partition in the list, etc.).
>
>Jason



'roundrobin' partition assignment strategy restrictions

2015-03-19 Thread Jason Rosenberg
So,

I've run into an issue migrating a consumer to use the new 'roundrobin'
partition.assignment.strategy.  It turns out that several of our consumers
use the same group id, but instantiate several different consumer instances
(with different topic selectors and thread counts).  Often, this is done in
a single shared process.  It turns out this arrangement is not allowed when
using the 'roundrobin' assignment strategy.

I'm curious as to the reason for this restriction?  Why is it not also a
restriction for the 'range' strategy (which we've been happily using for
some time now)?

It would seem that as long as you always assign a partition to a consumer
instance that is actually selecting it, you should still be able to proceed
with the round-robin algorithm (potentially skipping consumers if they
can't select the next partition in the list, etc.).

Jason