Becket: I think the problem is that when we have a single member joining an unknown group for the first time ever, do we want to complete the rebalance immediately or not; it does not matter if we want to "start" the rebalance, since even for now if the group coordinator is in the SyncGroup phase waiting for the consumers to send the SyncGroup requests, if it then receives a JoinGroup request it will still cancel the current rebalance and falls back to the beginning of the PrepareRebalance.
So with the configurable delta, if it will indeed prevent the started rebalance to complete, then the console consumer will still be affected; if it will not prevent the started rebalance to complete, then we may still get consecutive rebalances since the first rebalance will usually complete very quick. I think the proposal for having the configuration on the client-side instead of on the broker side does not mean that users now need to worry about the config: with an default value of, say 0, as long as they do not observe any consecutive rebalance issues they may never need to be aware of such configs at all. And for some higher-level clients like Streams, we may decide to change its default configs to be larger than 0 as it may be more common to hit the issue. Greg: Regarding notifying the users with too frequent rebalances, I think it would be a better mechanism for users to monitor on a certain metric (say, rebalance rate) than watching on the config? Under normal opration this rebalance rate should be 0 with only a rare spike from time to time; if there is continuous non-zero values for this metric then users can be notified. And we can educate them about configuring their apps with the recommended values in web docs correspondingly? Guozhang On Thu, Jul 13, 2017 at 7:37 AM, Becket Qin <becket....@gmail.com> wrote: > I am a little hesitant to add the configuration to the client. It would be > more flexible but this seems not the thing that users should worry about (I > imagine many people would simply set backoff to 0 just for fast rebalance). > I am wondering if the following variant of the current solution will > address the problem. > > 1. broker will start to rebalance immediately when the first member joins > the group at T0. > > 2. If another member joins the group at T1 which is between T0 and T0 + > delta (configurable), the broker will wait until T1 + delta then do the > rebalance. Any additional member joining before the rebalance kicks off > would result in the delay of the rebalance with the same extension logic as > we have now. We can also try some exponential back off if needed. > > This should help address the console consumer problem. Not sure if there > are other cases that needs to be considered, though. > > Thanks, > > Jiangjie (Becket) Qin > > On Mon, Jul 10, 2017 at 5:28 PM, Greg Fodor <gfo...@gmail.com> wrote: > > > Found this thread after posting an alternative idea after we starting > > hitting this issue ourselves for a job that has a lot of state stores and > > topic partitions. My suggestion was to have consumer groups have a > > configurable minimum member count before consumption begins, but that has > > its own trade offs and benefits (maybe a different KIP.) > > > > One suggestion I had is maybe there is some relatively fool-proof > heuristic > > that can cause Kafka Streams to emit an INFO/WARN to the log to inform > the > > user of the configuration if it detects a rapid rebalance on startup due > to > > new nodes joining? For example, if streams detects a rebalance, before > > processors are initialized, that only add new nodes, if the configuration > > has not been overridden, write to the log? > > > > > > > > On Thu, Jun 8, 2017 at 2:56 PM, Guozhang Wang <wangg...@gmail.com> > wrote: > > > > > Just recapping on client-side v.s. broker-side config: we did discuss > > about > > > adding this as a client-side config and bump up join-group request (I > > think > > > both Ismael and Ewen questioned about it) to include this configured > > value > > > to the broker. I cannot remember if there is any strong motivations > > against > > > going to the client-side config, except that we felt a default non-zero > > > value will benefit most users assuming they start with more than one > > member > > > in their group but only advanced users would really realize this config > > > existing and tune it themselves. > > > > > > I agree that we could re-consider it for the next release if we observe > > > that it is actually affecting more users than benefiting them. > > > > > > Guozhang > > > > > > On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <damian....@gmail.com> > wrote: > > > > > > > Hi Jun/Ismael, > > > > > > > > Sounds good to me. > > > > > > > > Thanks, > > > > Damian > > > > > > > > On Tue, 6 Jun 2017 at 23:08 Ismael Juma <ism...@juma.me.uk> wrote: > > > > > > > > > Hi Jun, > > > > > > > > > > The console consumer issue also came up in a conversation I was > > having > > > > > recently. Seems like the config/server.properties change is a > > > reasonable > > > > > compromise given that we have other defaults that are for > > development. > > > > > > > > > > Ismael > > > > > > > > > > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > > > Hi, Everyone, > > > > > > > > > > > > Sorry for being late on this thread. I just came across this > > thread. > > > I > > > > > have > > > > > > a couple of concerns on this. (1) It seems the amount of delay > will > > > be > > > > > > application specific. So, it seems that it's better for the delay > > to > > > > be a > > > > > > client side config instead of a server side one? (2) When running > > > > console > > > > > > consumer in quickstart, a minimum of 3 sec delay seems to be a > bad > > > > > > experience for our users. > > > > > > > > > > > > Since we are getting late into the release cycle, it may be a bit > > too > > > > > late > > > > > > to make big changes in the 0.11 release. Perhaps we should at > least > > > > > > consider overriding the delay in config/server.properties to 0 to > > > > improve > > > > > > the quickstart experience? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Jun > > > > > > > > > > > > > > > > > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy < > damian....@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Hi Onur, > > > > > > > > > > > > > > It was in my previous email. But here it is again. > > > > > > > > > > > > > > ============================================================ > > > > > > > > > > > > > > 1. Better rebalance timing. We will try to rebalance only when > > all > > > > the > > > > > > > consumers in a group have joined. The challenge would be > someone > > > has > > > > to > > > > > > > define what does ALL consumers mean, it could either be a time > or > > > > > number > > > > > > of > > > > > > > consumers, etc. > > > > > > > > > > > > > > 2. Avoid frequent rebalance. For example, if there are 100 > > > consumers > > > > > in a > > > > > > > group, today, in the worst case, we may end up with 100 > > rebalances > > > > even > > > > > > if > > > > > > > all the consumers joined the group in a reasonably small amount > > of > > > > > time. > > > > > > > Frequent rebalance is also a bad thing for brokers. > > > > > > > > > > > > > > Having a client side configuration may solve problem 1 better > > > because > > > > > > each > > > > > > > consumer group can potentially configure their own timing. > > However, > > > > it > > > > > > does > > > > > > > not really prevent frequent rebalance in general because some > of > > > the > > > > > > > consumers can be misconfigured. (This may have something to do > > with > > > > > > KIP-124 > > > > > > > as well. But if quota is applied on the JoinGroup/SyncGroup > > request > > > > it > > > > > > may > > > > > > > cause some unwanted cascading effects.) > > > > > > > > > > > > > > Having a broker side configuration may result in less > flexibility > > > for > > > > > > each > > > > > > > consumer group, but it can prevent frequent rebalance better. I > > > think > > > > > > with > > > > > > > some reasonable design, the rebalance timing issue can be > > resolved > > > on > > > > > the > > > > > > > broker side as well. Matthias had a good point on extending the > > > delay > > > > > > when > > > > > > > a new consumer joins a group (we actually did something similar > > to > > > > > batch > > > > > > > ISR change propagation). For example, let's say on the broker > > side, > > > > we > > > > > > will > > > > > > > always delay 2 seconds each time we see a new consumer joining > a > > > > > consumer > > > > > > > group. This would probably work for most of the consumer groups > > and > > > > > will > > > > > > > also limit the rebalance frequency to protect the brokers. > > > > > > > > > > > > > > I am not sure about the streams use case here, but if something > > > like > > > > 2 > > > > > > > seconds of delay is acceptable for streams, I would prefer > adding > > > the > > > > > > > configuration to the broker so that we can address both > problems. > > > > > > > > > > > > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman < > > > > onurkaraman.apa...@gmail.com > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > Hi Damian. > > > > > > > > > > > > > > > > Can you copy the point Becket made earlier that you say isn't > > > > > > addressed? > > > > > > > > > > > > > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy < > > damian....@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Thanks all, the Vote is now closed and the KIP has been > > > accepted > > > > > > with 9 > > > > > > > > +1s > > > > > > > > > > > > > > > > > > 3 binding:: > > > > > > > > > Guozhang, > > > > > > > > > Jason, > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > 6 non-binding: > > > > > > > > > Bill, > > > > > > > > > Eno, > > > > > > > > > Mathieu, > > > > > > > > > Matthias, > > > > > > > > > Dong, > > > > > > > > > Mickael > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Damian > > > > > > > > > > > > > > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <ism...@juma.me.uk > > > > > > wrote: > > > > > > > > > > > > > > > > > > > Thanks for the KIP, +1 (binding). > > > > > > > > > > > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson < > > > > > > ja...@confluent.io > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > +1 Thanks for the KIP! > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang < > > > > > > > wangg...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > > > Sorry about the previous email, Gmail seems be > > collapsing > > > > > them > > > > > > > > into a > > > > > > > > > > > > single thread on my inbox. > > > > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang < > > > > > > > > wangg...@gmail.com> > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Damian, could you create a new thread for the > voting > > > > > process? > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck < > > > > > > > bbej...@gmail.com > > > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > >> +1(non-binding) > > > > > > > > > > > > >> > > > > > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska < > > > > > > > > > > eno.there...@gmail.com > > > > > > > > > > > > > > > > > > > > > > > > >> wrote: > > > > > > > > > > > > >> > > > > > > > > > > > > >> > +1 (non binding) > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > Thanks > > > > > > > > > > > > >> > Eno > > > > > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax < > > > > > > > > > > matth...@confluent.io> > > > > > > > > > > > > >> wrote: > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > +1 > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote: > > > > > > > > > > > > >> > >> Hi All, > > > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> I'd like to start the voting thread on > KIP-134: > > > > > > > > > > > > >> > >> > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance > > > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > >> Thanks, > > > > > > > > > > > > >> > >> Damian > > > > > > > > > > > > >> > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > -- Guozhang > > > > > > -- -- Guozhang