I am a little hesitant to add the configuration to the client. It would be more flexible but this seems not the thing that users should worry about (I imagine many people would simply set backoff to 0 just for fast rebalance). I am wondering if the following variant of the current solution will address the problem.
1. broker will start to rebalance immediately when the first member joins the group at T0. 2. If another member joins the group at T1 which is between T0 and T0 + delta (configurable), the broker will wait until T1 + delta then do the rebalance. Any additional member joining before the rebalance kicks off would result in the delay of the rebalance with the same extension logic as we have now. We can also try some exponential back off if needed. This should help address the console consumer problem. Not sure if there are other cases that needs to be considered, though. Thanks, Jiangjie (Becket) Qin On Mon, Jul 10, 2017 at 5:28 PM, Greg Fodor <gfo...@gmail.com> wrote: > Found this thread after posting an alternative idea after we starting > hitting this issue ourselves for a job that has a lot of state stores and > topic partitions. My suggestion was to have consumer groups have a > configurable minimum member count before consumption begins, but that has > its own trade offs and benefits (maybe a different KIP.) > > One suggestion I had is maybe there is some relatively fool-proof heuristic > that can cause Kafka Streams to emit an INFO/WARN to the log to inform the > user of the configuration if it detects a rapid rebalance on startup due to > new nodes joining? For example, if streams detects a rebalance, before > processors are initialized, that only add new nodes, if the configuration > has not been overridden, write to the log? > > > > On Thu, Jun 8, 2017 at 2:56 PM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Just recapping on client-side v.s. broker-side config: we did discuss > about > > adding this as a client-side config and bump up join-group request (I > think > > both Ismael and Ewen questioned about it) to include this configured > value > > to the broker. I cannot remember if there is any strong motivations > against > > going to the client-side config, except that we felt a default non-zero > > value will benefit most users assuming they start with more than one > member > > in their group but only advanced users would really realize this config > > existing and tune it themselves. > > > > I agree that we could re-consider it for the next release if we observe > > that it is actually affecting more users than benefiting them. > > > > Guozhang > > > > On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <damian....@gmail.com> wrote: > > > > > Hi Jun/Ismael, > > > > > > Sounds good to me. > > > > > > Thanks, > > > Damian > > > > > > On Tue, 6 Jun 2017 at 23:08 Ismael Juma <ism...@juma.me.uk> wrote: > > > > > > > Hi Jun, > > > > > > > > The console consumer issue also came up in a conversation I was > having > > > > recently. Seems like the config/server.properties change is a > > reasonable > > > > compromise given that we have other defaults that are for > development. > > > > > > > > Ismael > > > > > > > > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <j...@confluent.io> wrote: > > > > > > > > > Hi, Everyone, > > > > > > > > > > Sorry for being late on this thread. I just came across this > thread. > > I > > > > have > > > > > a couple of concerns on this. (1) It seems the amount of delay will > > be > > > > > application specific. So, it seems that it's better for the delay > to > > > be a > > > > > client side config instead of a server side one? (2) When running > > > console > > > > > consumer in quickstart, a minimum of 3 sec delay seems to be a bad > > > > > experience for our users. > > > > > > > > > > Since we are getting late into the release cycle, it may be a bit > too > > > > late > > > > > to make big changes in the 0.11 release. Perhaps we should at least > > > > > consider overriding the delay in config/server.properties to 0 to > > > improve > > > > > the quickstart experience? > > > > > > > > > > Thanks, > > > > > > > > > > Jun > > > > > > > > > > > > > > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <damian....@gmail.com > > > > > > wrote: > > > > > > > > > > > Hi Onur, > > > > > > > > > > > > It was in my previous email. But here it is again. > > > > > > > > > > > > ============================================================ > > > > > > > > > > > > 1. Better rebalance timing. We will try to rebalance only when > all > > > the > > > > > > consumers in a group have joined. The challenge would be someone > > has > > > to > > > > > > define what does ALL consumers mean, it could either be a time or > > > > number > > > > > of > > > > > > consumers, etc. > > > > > > > > > > > > 2. Avoid frequent rebalance. For example, if there are 100 > > consumers > > > > in a > > > > > > group, today, in the worst case, we may end up with 100 > rebalances > > > even > > > > > if > > > > > > all the consumers joined the group in a reasonably small amount > of > > > > time. > > > > > > Frequent rebalance is also a bad thing for brokers. > > > > > > > > > > > > Having a client side configuration may solve problem 1 better > > because > > > > > each > > > > > > consumer group can potentially configure their own timing. > However, > > > it > > > > > does > > > > > > not really prevent frequent rebalance in general because some of > > the > > > > > > consumers can be misconfigured. (This may have something to do > with > > > > > KIP-124 > > > > > > as well. But if quota is applied on the JoinGroup/SyncGroup > request > > > it > > > > > may > > > > > > cause some unwanted cascading effects.) > > > > > > > > > > > > Having a broker side configuration may result in less flexibility > > for > > > > > each > > > > > > consumer group, but it can prevent frequent rebalance better. I > > think > > > > > with > > > > > > some reasonable design, the rebalance timing issue can be > resolved > > on > > > > the > > > > > > broker side as well. Matthias had a good point on extending the > > delay > > > > > when > > > > > > a new consumer joins a group (we actually did something similar > to > > > > batch > > > > > > ISR change propagation). For example, let's say on the broker > side, > > > we > > > > > will > > > > > > always delay 2 seconds each time we see a new consumer joining a > > > > consumer > > > > > > group. This would probably work for most of the consumer groups > and > > > > will > > > > > > also limit the rebalance frequency to protect the brokers. > > > > > > > > > > > > I am not sure about the streams use case here, but if something > > like > > > 2 > > > > > > seconds of delay is acceptable for streams, I would prefer adding > > the > > > > > > configuration to the broker so that we can address both problems. > > > > > > > > > > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman < > > > onurkaraman.apa...@gmail.com > > > > > > > > > > > wrote: > > > > > > > > > > > > > Hi Damian. > > > > > > > > > > > > > > Can you copy the point Becket made earlier that you say isn't > > > > > addressed? > > > > > > > > > > > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy < > damian....@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > > > Thanks all, the Vote is now closed and the KIP has been > > accepted > > > > > with 9 > > > > > > > +1s > > > > > > > > > > > > > > > > 3 binding:: > > > > > > > > Guozhang, > > > > > > > > Jason, > > > > > > > > Ismael > > > > > > > > > > > > > > > > 6 non-binding: > > > > > > > > Bill, > > > > > > > > Eno, > > > > > > > > Mathieu, > > > > > > > > Matthias, > > > > > > > > Dong, > > > > > > > > Mickael > > > > > > > > > > > > > > > > Thanks, > > > > > > > > Damian > > > > > > > > > > > > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <ism...@juma.me.uk> > > > wrote: > > > > > > > > > > > > > > > > > Thanks for the KIP, +1 (binding). > > > > > > > > > > > > > > > > > > Ismael > > > > > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson < > > > > > ja...@confluent.io > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > +1 Thanks for the KIP! > > > > > > > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang < > > > > > > wangg...@gmail.com> > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > +1 > > > > > > > > > > > > > > > > > > > > > > Sorry about the previous email, Gmail seems be > collapsing > > > > them > > > > > > > into a > > > > > > > > > > > single thread on my inbox. > > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang < > > > > > > > wangg...@gmail.com> > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > Damian, could you create a new thread for the voting > > > > process? > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck < > > > > > > bbej...@gmail.com > > > > > > > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > >> +1(non-binding) > > > > > > > > > > > >> > > > > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska < > > > > > > > > > eno.there...@gmail.com > > > > > > > > > > > > > > > > > > > > > > >> wrote: > > > > > > > > > > > >> > > > > > > > > > > > >> > +1 (non binding) > > > > > > > > > > > >> > > > > > > > > > > > > >> > Thanks > > > > > > > > > > > >> > Eno > > > > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax < > > > > > > > > > matth...@confluent.io> > > > > > > > > > > > >> wrote: > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > +1 > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote: > > > > > > > > > > > >> > >> Hi All, > > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> I'd like to start the voting thread on KIP-134: > > > > > > > > > > > >> > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > > > > > > > >> > 134%3A+Delay+initial+consumer+group+rebalance > > > > > > > > > > > >> > >> > > > > > > > > > > > >> > >> Thanks, > > > > > > > > > > > >> > >> Damian > > > > > > > > > > > >> > >> > > > > > > > > > > > >> > > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > -- Guozhang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > -- Guozhang > > >