I don't understand, why you want to split join/leave into two parts...
But it up to you I guess.

+1 for broker config plus "retriggering" delay


-Matthias

On 3/28/17 1:53 AM, Damian Guy wrote:
> All,
> I'd like to get this back to the original discussion about Delaying initial
> consumer group rebalance.
> I think i'm leaning towards sticking with the broker config and changing
> the delay so that the timer starts again when a new consumer joins the
> group. What are peoples thoughts on that?
> 
> Doing something similar on leave is valid, but i'd prefer to consider it
> separately from this.
> 
> Thanks,
> Damian
> 
> On Tue, 28 Mar 2017 at 09:48 Damian Guy <damian....@gmail.com> wrote:
> 
>> Matthias,
>>
>> Yes i know.
>>
>> Thanks,
>> Damian
>>
>> On Mon, 27 Mar 2017 at 18:17 Matthias J. Sax <matth...@confluent.io>
>> wrote:
>>
>> Damian,
>>
>> about "rebalance immediately" on timeout -- I guess, that's a different
>> case as no LeaveGroupRequest will be sent. Thus, the broker should be
>> able to distinguish both cases easily, and apply the delay only if it
>> received the LeaveGroupRequest but not if a consumer times out.
>>
>> Does this make sense?
>>
>> -Matthias
>>
>> On 3/27/17 1:56 AM, Damian Guy wrote:
>>> @Becket
>>>
>>> Thanks for the feedback. Yes, i like the idea of extending the delay as
>>> each new consumer joins the group. Though, i think this could be done
>> with
>>> either a consumer or broker side config. But i get your point that some
>>> consumers in the group can be misconfigured.
>>>
>>> @Matthias & @Eno - yes we could probably do something similar if the
>> member
>>> has sent the LeaveGroupRequest. I'm not sure it would be valid if the
>>> member crashed, hence session.timeout would come into play, we'd probably
>>> want to rebalance immediately. I'd be interested in hearing thoughts from
>>> other core kafka folks on this one.
>>>
>>> Thanks,
>>> Damian
>>>
>>>
>>>
>>> On Fri, 24 Mar 2017 at 23:01 Becket Qin <becket....@gmail.com> wrote:
>>>
>>>> Hi Matthias,
>>>>
>>>> Yes, that was what I was thinking. We will keep delay it until either
>>>> reaching the rebalance timeout or no new consumer joins in that small
>> delay
>>>> which is configured on the broker side.
>>>>
>>>> Thanks,
>>>>
>>>> Jiangjie (Becket) Qin
>>>>
>>>> On Fri, Mar 24, 2017 at 1:39 PM, Matthias J. Sax <matth...@confluent.io
>>>
>>>> wrote:
>>>>
>>>>> @Becket:
>>>>>
>>>>> I am not sure, if I understand this correctly. Instead of applying a
>>>>> fixed delay, that starts when the first consumer of an (empty) group
>>>>> joins, you suggest to re-trigger/re-set the delay each time a new
>>>>> consumer joins?
>>>>>
>>>>> This sound like a good strategy to me, if the config is on the broker
>>>> side.
>>>>>
>>>>> @Eno:
>>>>>
>>>>> I think that's a valid point and I like this idea!
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 3/24/17 1:23 PM, Eno Thereska wrote:
>>>>>> Thanks Damian,
>>>>>>
>>>>>> This KIP deals with the initial phase only. What about the cases when
>>>>> several consumers leave a group? Won't there be several expensive
>>>>> rebalances then as well? I'm wondering if it makes sense for the delay
>> to
>>>>> hold anytime the "set" of consumers in a group changes, be it addition
>> to
>>>>> the group or removal from group.
>>>>>>
>>>>>> Thanks
>>>>>> Eno
>>>>>>
>>>>>>
>>>>>>> On 24 Mar 2017, at 20:04, Becket Qin <becket....@gmail.com> wrote:
>>>>>>>
>>>>>>> Thanks for the KIP, Damian.
>>>>>>>
>>>>>>> My two cents on this. It seems there are two things worth thinking
>>>> here:
>>>>>>>
>>>>>>> 1. Better rebalance timing. We will try to rebalance only when all
>> the
>>>>>>> consumers in a group have joined. The challenge would be someone has
>>>> to
>>>>>>> define what does ALL consumers mean, it could either be a time or
>>>>> number of
>>>>>>> consumers, etc.
>>>>>>>
>>>>>>> 2. Avoid frequent rebalance. For example, if there are 100 consumers
>>>> in
>>>>> a
>>>>>>> group, today, in the worst case, we may end up with 100 rebalances
>>>> even
>>>>> if
>>>>>>> all the consumers joined the group in a reasonably small amount of
>>>> time.
>>>>>>> Frequent rebalance is also a bad thing for brokers.
>>>>>>>
>>>>>>> Having a client side configuration may solve problem 1 better because
>>>>> each
>>>>>>> consumer group can potentially configure their own timing. However,
>> it
>>>>> does
>>>>>>> not really prevent frequent rebalance in general because some of the
>>>>>>> consumers can be misconfigured. (This may have something to do with
>>>>> KIP-124
>>>>>>> as well. But if quota is applied on the JoinGroup/SyncGroup request
>> it
>>>>> may
>>>>>>> cause some unwanted cascading effects.)
>>>>>>>
>>>>>>> Having a broker side configuration may result in less flexibility for
>>>>> each
>>>>>>> consumer group, but it can prevent frequent rebalance better. I think
>>>>> with
>>>>>>> some reasonable design, the rebalance timing issue can be resolved on
>>>>> the
>>>>>>> broker side as well. Matthias had a good point on extending the delay
>>>>> when
>>>>>>> a new consumer joins a group (we actually did something similar to
>>>> batch
>>>>>>> ISR change propagation). For example, let's say on the broker side,
>> we
>>>>> will
>>>>>>> always delay 2 seconds each time we see a new consumer joining a
>>>>> consumer
>>>>>>> group. This would probably work for most of the consumer groups and
>>>> will
>>>>>>> also limit the rebalance frequency to protect the brokers.
>>>>>>>
>>>>>>> I am not sure about the streams use case here, but if something like
>> 2
>>>>>>> seconds of delay is acceptable for streams, I would prefer adding the
>>>>>>> configuration to the broker so that we can address both problems.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jiangjie (Becket) Qin
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy <damian....@gmail.com>
>>>>> wrote:
>>>>>>>
>>>>>>>> Thanks for the feedback.
>>>>>>>>
>>>>>>>> Ewen: I'm happy to make it a client side config. Other than the
>>>>> protocol
>>>>>>>> bump i think the effort is almost the same. Personally i see no
>> other
>>>>>>>> issues, but based on discussions with others this is what we came up
>>>>> with.
>>>>>>>>
>>>>>>>> True, it can probably be tested easily via an integration test.
>>>>>>>>
>>>>>>>> Matthias: Yes i agree, the delay could be extended as each new
>> member
>>>>> joins
>>>>>>>> the group.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Damian
>>>>>>>>
>>>>>>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava <
>>>> e...@confluent.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have the same initial response as Ismael re: broker vs consumer
>>>>>>>> settings.
>>>>>>>>> The global setting seems questionable.
>>>>>>>>>
>>>>>>>>> Could we maybe summarize what the impact of making this a client
>>>>> config
>>>>>>>>> would be? Protocol bump is obvious, but is there any other
>>>> significant
>>>>>>>>> issue? For the protocol bump in particular, I think this change is
>>>>>>>>> currently really critical for streams; it will be valuable
>>>> elsewhere,
>>>>> but
>>>>>>>>> the immediate demand is streams, so a protocol bump while being
>>>>> backwards
>>>>>>>>> compatible wouldn't affect any other clients. Is this still
>> actually
>>>>>>>>> compatible with different clients given that they would now expect
>>>>>>>>> different timeouts? (I think it's strictly compatible if you wait
>>>> for
>>>>>>>>> responses, but if you enforce any client side timeouts, I'm not so
>>>>> sure.)
>>>>>>>>>
>>>>>>>>> re: test plan, I'm sure this will come as a surprise, but is the
>>>>> system
>>>>>>>>> test even necessary? Validating # of rebalances seems messy as
>> other
>>>>>>>> things
>>>>>>>>> can cause rebalances (though admittedly not in a "clean" case). But
>>>>>>>> really
>>>>>>>>> it seems like an integration test could validate this by making
>> sure
>>>>>>>> only 1
>>>>>>>>> rebalance occurred when 2 members joined with a sufficient time
>> gap.
>>>>>>>>>
>>>>>>>>> -Ewen
>>>>>>>>>
>>>>>>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax <
>>>>> matth...@confluent.io>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the KIP Damian!
>>>>>>>>>>
>>>>>>>>>> My two cents:
>>>>>>>>>>
>>>>>>>>>> - we should have an explicit parameter for this -- implicit
>> setting
>>>>>>>> are
>>>>>>>>>> always tricky (the "importance" of this parameter would be LOW)
>>>>>>>>>>
>>>>>>>>>> - the config should be different for each consumer group:
>>>>>>>>>>   * assume you have a stateless app, you want to rebalance
>>>>> immediately
>>>>>>>>>>   * if you start-up in an visualized environment using some tools
>>>>> like
>>>>>>>>>> Mesos you might need a different value that on bare metal (no VM
>> to
>>>>> be
>>>>>>>>>> started)
>>>>>>>>>>   * it also depends, how many consumer instanced you expect --
>> it's
>>>>>>>>>> harder to start up 100 instances in 3 seconds than 5
>>>>>>>>>>
>>>>>>>>>> - the default value should be zero
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> One more thought: what about scaling scenarios? If a consumer
>> group
>>>>> has
>>>>>>>>>> 10 instanced and should be scaled up to 20, it would make sense to
>>>> do
>>>>>>>>>> this with a single rebalance, too. Thus, I am wondering, if it
>>>> would
>>>>>>>>>> make sense to apply this delay each time a new consumer joins
>>>> group,
>>>>>>>>>> even if the group is not empty?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Matthias
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 3/23/17 10:19 AM, Damian Guy wrote:
>>>>>>>>>>> Thanks Gouzhang - i think another problem with this is that is
>>>>>>>>>> overloading
>>>>>>>>>>> session.timeout.ms to mean multiple things. I'm not sure that is
>>>> a
>>>>>>>>> good
>>>>>>>>>>> thing.
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang <wangg...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> The downside of it, though, is that although it "hides" this
>> from
>>>>>>>> most
>>>>>>>>>> of
>>>>>>>>>>>> the users needing to be aware of it, by default session timeout
>>>>> i.e.
>>>>>>>>> the
>>>>>>>>>>>> rebalance timeout is 10 seconds which could arguably too long.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang <
>>>>> wangg...@gmail.com
>>>>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Just throwing another alternative idea here: we can consider
>>>> using
>>>>>>>>> the
>>>>>>>>>>>>> rebalance timeout value which is already included in the join
>>>>>>>> request
>>>>>>>>>>>>> protocol (and on the current Java client it is always written
>> as
>>>>>>>> the
>>>>>>>>>>>>> session timeout value), that the first member joining will
>>>> always
>>>>>>>>> force
>>>>>>>>>>>> the
>>>>>>>>>>>>> coordinator to wait that long. By doing this we do not need to
>>>>> bump
>>>>>>>>> up
>>>>>>>>>>>> the
>>>>>>>>>>>>> protocol either.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy <
>>>> damian....@gmail.com
>>>>>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Ismael,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Mostly to avoid the protocol bump.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I agree that it may be difficult to choose the right delay for
>>>>> all
>>>>>>>>>>>>>> consumer
>>>>>>>>>>>>>> groups, but we wanted to make this something that most users
>>>>> don't
>>>>>>>>>>>> really
>>>>>>>>>>>>>> need to think about, i.e., a small enough default delay that
>>>>> works
>>>>>>>>> in
>>>>>>>>>>>> the
>>>>>>>>>>>>>> majority of cases. However it would be much more flexible as a
>>>>>>>>>> consumer
>>>>>>>>>>>>>> config, which i'm happy to pursue if this change is worthy of
>> a
>>>>>>>>>> protocol
>>>>>>>>>>>>>> bump.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma <ism...@juma.me.uk>
>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to avoid multiple
>>>>>>>>>>>> rebalances
>>>>>>>>>>>>>>> during start-up. One issue with having this as a broker
>> config
>>>>> is
>>>>>>>>>> that
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> may be difficult to choose the right delay for all consumer
>>>>>>>> groups.
>>>>>>>>>>>> Can
>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>> elaborate a little more on why the first alternative (add a
>>>>>>>>> consumer
>>>>>>>>>>>>>>> config) was rejected? We bump protocol versions regularly
>>>> (when
>>>>>>>> it
>>>>>>>>>>>> makes
>>>>>>>>>>>>>>> sense), so it would be good to get a bit more detail.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Ismael
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy <
>>>>>>>> damian....@gmail.com
>>>>>>>>>>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi All,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've prepared a KIP to add a configurable delay to the
>>>> initial
>>>>>>>>>>>>>> consumer
>>>>>>>>>>>>>>>> group rebalance.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Please have look here:
>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> BTW, i apologize if this appears twice. Seems the first one
>>>> may
>>>>>>>>> have
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>> made it.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to