I don't understand, why you want to split join/leave into two parts... But it up to you I guess.
+1 for broker config plus "retriggering" delay -Matthias On 3/28/17 1:53 AM, Damian Guy wrote: > All, > I'd like to get this back to the original discussion about Delaying initial > consumer group rebalance. > I think i'm leaning towards sticking with the broker config and changing > the delay so that the timer starts again when a new consumer joins the > group. What are peoples thoughts on that? > > Doing something similar on leave is valid, but i'd prefer to consider it > separately from this. > > Thanks, > Damian > > On Tue, 28 Mar 2017 at 09:48 Damian Guy <damian....@gmail.com> wrote: > >> Matthias, >> >> Yes i know. >> >> Thanks, >> Damian >> >> On Mon, 27 Mar 2017 at 18:17 Matthias J. Sax <matth...@confluent.io> >> wrote: >> >> Damian, >> >> about "rebalance immediately" on timeout -- I guess, that's a different >> case as no LeaveGroupRequest will be sent. Thus, the broker should be >> able to distinguish both cases easily, and apply the delay only if it >> received the LeaveGroupRequest but not if a consumer times out. >> >> Does this make sense? >> >> -Matthias >> >> On 3/27/17 1:56 AM, Damian Guy wrote: >>> @Becket >>> >>> Thanks for the feedback. Yes, i like the idea of extending the delay as >>> each new consumer joins the group. Though, i think this could be done >> with >>> either a consumer or broker side config. But i get your point that some >>> consumers in the group can be misconfigured. >>> >>> @Matthias & @Eno - yes we could probably do something similar if the >> member >>> has sent the LeaveGroupRequest. I'm not sure it would be valid if the >>> member crashed, hence session.timeout would come into play, we'd probably >>> want to rebalance immediately. I'd be interested in hearing thoughts from >>> other core kafka folks on this one. >>> >>> Thanks, >>> Damian >>> >>> >>> >>> On Fri, 24 Mar 2017 at 23:01 Becket Qin <becket....@gmail.com> wrote: >>> >>>> Hi Matthias, >>>> >>>> Yes, that was what I was thinking. We will keep delay it until either >>>> reaching the rebalance timeout or no new consumer joins in that small >> delay >>>> which is configured on the broker side. >>>> >>>> Thanks, >>>> >>>> Jiangjie (Becket) Qin >>>> >>>> On Fri, Mar 24, 2017 at 1:39 PM, Matthias J. Sax <matth...@confluent.io >>> >>>> wrote: >>>> >>>>> @Becket: >>>>> >>>>> I am not sure, if I understand this correctly. Instead of applying a >>>>> fixed delay, that starts when the first consumer of an (empty) group >>>>> joins, you suggest to re-trigger/re-set the delay each time a new >>>>> consumer joins? >>>>> >>>>> This sound like a good strategy to me, if the config is on the broker >>>> side. >>>>> >>>>> @Eno: >>>>> >>>>> I think that's a valid point and I like this idea! >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> >>>>> On 3/24/17 1:23 PM, Eno Thereska wrote: >>>>>> Thanks Damian, >>>>>> >>>>>> This KIP deals with the initial phase only. What about the cases when >>>>> several consumers leave a group? Won't there be several expensive >>>>> rebalances then as well? I'm wondering if it makes sense for the delay >> to >>>>> hold anytime the "set" of consumers in a group changes, be it addition >> to >>>>> the group or removal from group. >>>>>> >>>>>> Thanks >>>>>> Eno >>>>>> >>>>>> >>>>>>> On 24 Mar 2017, at 20:04, Becket Qin <becket....@gmail.com> wrote: >>>>>>> >>>>>>> Thanks for the KIP, Damian. >>>>>>> >>>>>>> My two cents on this. It seems there are two things worth thinking >>>> here: >>>>>>> >>>>>>> 1. Better rebalance timing. We will try to rebalance only when all >> the >>>>>>> consumers in a group have joined. The challenge would be someone has >>>> to >>>>>>> define what does ALL consumers mean, it could either be a time or >>>>> number of >>>>>>> consumers, etc. >>>>>>> >>>>>>> 2. Avoid frequent rebalance. For example, if there are 100 consumers >>>> in >>>>> a >>>>>>> group, today, in the worst case, we may end up with 100 rebalances >>>> even >>>>> if >>>>>>> all the consumers joined the group in a reasonably small amount of >>>> time. >>>>>>> Frequent rebalance is also a bad thing for brokers. >>>>>>> >>>>>>> Having a client side configuration may solve problem 1 better because >>>>> each >>>>>>> consumer group can potentially configure their own timing. However, >> it >>>>> does >>>>>>> not really prevent frequent rebalance in general because some of the >>>>>>> consumers can be misconfigured. (This may have something to do with >>>>> KIP-124 >>>>>>> as well. But if quota is applied on the JoinGroup/SyncGroup request >> it >>>>> may >>>>>>> cause some unwanted cascading effects.) >>>>>>> >>>>>>> Having a broker side configuration may result in less flexibility for >>>>> each >>>>>>> consumer group, but it can prevent frequent rebalance better. I think >>>>> with >>>>>>> some reasonable design, the rebalance timing issue can be resolved on >>>>> the >>>>>>> broker side as well. Matthias had a good point on extending the delay >>>>> when >>>>>>> a new consumer joins a group (we actually did something similar to >>>> batch >>>>>>> ISR change propagation). For example, let's say on the broker side, >> we >>>>> will >>>>>>> always delay 2 seconds each time we see a new consumer joining a >>>>> consumer >>>>>>> group. This would probably work for most of the consumer groups and >>>> will >>>>>>> also limit the rebalance frequency to protect the brokers. >>>>>>> >>>>>>> I am not sure about the streams use case here, but if something like >> 2 >>>>>>> seconds of delay is acceptable for streams, I would prefer adding the >>>>>>> configuration to the broker so that we can address both problems. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Jiangjie (Becket) Qin >>>>>>> >>>>>>> >>>>>>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy <damian....@gmail.com> >>>>> wrote: >>>>>>> >>>>>>>> Thanks for the feedback. >>>>>>>> >>>>>>>> Ewen: I'm happy to make it a client side config. Other than the >>>>> protocol >>>>>>>> bump i think the effort is almost the same. Personally i see no >> other >>>>>>>> issues, but based on discussions with others this is what we came up >>>>> with. >>>>>>>> >>>>>>>> True, it can probably be tested easily via an integration test. >>>>>>>> >>>>>>>> Matthias: Yes i agree, the delay could be extended as each new >> member >>>>> joins >>>>>>>> the group. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Damian >>>>>>>> >>>>>>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava < >>>> e...@confluent.io> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I have the same initial response as Ismael re: broker vs consumer >>>>>>>> settings. >>>>>>>>> The global setting seems questionable. >>>>>>>>> >>>>>>>>> Could we maybe summarize what the impact of making this a client >>>>> config >>>>>>>>> would be? Protocol bump is obvious, but is there any other >>>> significant >>>>>>>>> issue? For the protocol bump in particular, I think this change is >>>>>>>>> currently really critical for streams; it will be valuable >>>> elsewhere, >>>>> but >>>>>>>>> the immediate demand is streams, so a protocol bump while being >>>>> backwards >>>>>>>>> compatible wouldn't affect any other clients. Is this still >> actually >>>>>>>>> compatible with different clients given that they would now expect >>>>>>>>> different timeouts? (I think it's strictly compatible if you wait >>>> for >>>>>>>>> responses, but if you enforce any client side timeouts, I'm not so >>>>> sure.) >>>>>>>>> >>>>>>>>> re: test plan, I'm sure this will come as a surprise, but is the >>>>> system >>>>>>>>> test even necessary? Validating # of rebalances seems messy as >> other >>>>>>>> things >>>>>>>>> can cause rebalances (though admittedly not in a "clean" case). But >>>>>>>> really >>>>>>>>> it seems like an integration test could validate this by making >> sure >>>>>>>> only 1 >>>>>>>>> rebalance occurred when 2 members joined with a sufficient time >> gap. >>>>>>>>> >>>>>>>>> -Ewen >>>>>>>>> >>>>>>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax < >>>>> matth...@confluent.io> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Thanks for the KIP Damian! >>>>>>>>>> >>>>>>>>>> My two cents: >>>>>>>>>> >>>>>>>>>> - we should have an explicit parameter for this -- implicit >> setting >>>>>>>> are >>>>>>>>>> always tricky (the "importance" of this parameter would be LOW) >>>>>>>>>> >>>>>>>>>> - the config should be different for each consumer group: >>>>>>>>>> * assume you have a stateless app, you want to rebalance >>>>> immediately >>>>>>>>>> * if you start-up in an visualized environment using some tools >>>>> like >>>>>>>>>> Mesos you might need a different value that on bare metal (no VM >> to >>>>> be >>>>>>>>>> started) >>>>>>>>>> * it also depends, how many consumer instanced you expect -- >> it's >>>>>>>>>> harder to start up 100 instances in 3 seconds than 5 >>>>>>>>>> >>>>>>>>>> - the default value should be zero >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> One more thought: what about scaling scenarios? If a consumer >> group >>>>> has >>>>>>>>>> 10 instanced and should be scaled up to 20, it would make sense to >>>> do >>>>>>>>>> this with a single rebalance, too. Thus, I am wondering, if it >>>> would >>>>>>>>>> make sense to apply this delay each time a new consumer joins >>>> group, >>>>>>>>>> even if the group is not empty? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Matthias >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 3/23/17 10:19 AM, Damian Guy wrote: >>>>>>>>>>> Thanks Gouzhang - i think another problem with this is that is >>>>>>>>>> overloading >>>>>>>>>>> session.timeout.ms to mean multiple things. I'm not sure that is >>>> a >>>>>>>>> good >>>>>>>>>>> thing. >>>>>>>>>>> >>>>>>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang <wangg...@gmail.com> >>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> The downside of it, though, is that although it "hides" this >> from >>>>>>>> most >>>>>>>>>> of >>>>>>>>>>>> the users needing to be aware of it, by default session timeout >>>>> i.e. >>>>>>>>> the >>>>>>>>>>>> rebalance timeout is 10 seconds which could arguably too long. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Guozhang >>>>>>>>>>>> >>>>>>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang < >>>>> wangg...@gmail.com >>>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Just throwing another alternative idea here: we can consider >>>> using >>>>>>>>> the >>>>>>>>>>>>> rebalance timeout value which is already included in the join >>>>>>>> request >>>>>>>>>>>>> protocol (and on the current Java client it is always written >> as >>>>>>>> the >>>>>>>>>>>>> session timeout value), that the first member joining will >>>> always >>>>>>>>> force >>>>>>>>>>>> the >>>>>>>>>>>>> coordinator to wait that long. By doing this we do not need to >>>>> bump >>>>>>>>> up >>>>>>>>>>>> the >>>>>>>>>>>>> protocol either. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Guozhang >>>>>>>>>>>>> >>>>>>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy < >>>> damian....@gmail.com >>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi Ismael, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Mostly to avoid the protocol bump. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I agree that it may be difficult to choose the right delay for >>>>> all >>>>>>>>>>>>>> consumer >>>>>>>>>>>>>> groups, but we wanted to make this something that most users >>>>> don't >>>>>>>>>>>> really >>>>>>>>>>>>>> need to think about, i.e., a small enough default delay that >>>>> works >>>>>>>>> in >>>>>>>>>>>> the >>>>>>>>>>>>>> majority of cases. However it would be much more flexible as a >>>>>>>>>> consumer >>>>>>>>>>>>>> config, which i'm happy to pursue if this change is worthy of >> a >>>>>>>>>> protocol >>>>>>>>>>>>>> bump. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> Damian >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma <ism...@juma.me.uk> >>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to avoid multiple >>>>>>>>>>>> rebalances >>>>>>>>>>>>>>> during start-up. One issue with having this as a broker >> config >>>>> is >>>>>>>>>> that >>>>>>>>>>>>>> it >>>>>>>>>>>>>>> may be difficult to choose the right delay for all consumer >>>>>>>> groups. >>>>>>>>>>>> Can >>>>>>>>>>>>>> you >>>>>>>>>>>>>>> elaborate a little more on why the first alternative (add a >>>>>>>>> consumer >>>>>>>>>>>>>>> config) was rejected? We bump protocol versions regularly >>>> (when >>>>>>>> it >>>>>>>>>>>> makes >>>>>>>>>>>>>>> sense), so it would be good to get a bit more detail. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> Ismael >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy < >>>>>>>> damian....@gmail.com >>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi All, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I've prepared a KIP to add a configurable delay to the >>>> initial >>>>>>>>>>>>>> consumer >>>>>>>>>>>>>>>> group rebalance. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Please have look here: >>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- >>>>>>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> Damian >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> BTW, i apologize if this appears twice. Seems the first one >>>> may >>>>>>>>> have >>>>>>>>>>>>>> not >>>>>>>>>>>>>>>> made it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -- >>>>>>>>>>>>> -- Guozhang >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -- >>>>>>>>>>>> -- Guozhang >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >> >
signature.asc
Description: OpenPGP digital signature