Becket,

Thanks for the replies. Now I see you want to optimize with the heuristic
that "if I see a JoinGroup shortly enough after a rebalance is completed,
then likely there are more JoinGroups coming". I agree that it will help
with a single-instance console consumer for debugging etc, but still in
some unit test cases where we do have more than one instances for the group
it may still be an issue. Plus this logic seem to be more complicated.

On the other hand, without KIP-134 today we are already under the issue
that with large consumer groups consecutive rebalances may be triggered
which takes long latency, so if users mistakenly set the config to 0 it
will not be worse than what we already have today. So to me having this
config on the client side would not introduce any regression. In addition,
we can extend this mechanism to not only for generation 0 (i.e. for the
first time the group has formed), but for any rebalances, which could all
be vulnerable to consecutive rebalances when there is a topic / member
change (e.g. for MM, rebalance can take long to stabilize even after it has
been running for a while).


Guozhang


On Mon, Jul 17, 2017 at 9:37 PM, Becket Qin <becket....@gmail.com> wrote:

> Hi Guozhang,
>
> Sorry for the confusion. I actually meant always "complete" the rebalance
> immediately when the first consumer joining the group. i.e. the
> configurable delta only kicks in after the first rebalance.
>
> The concern I have was actually not the frequent rebalance for the users,
> but the pressure on the broker side when frequent rebalance happens. For
> example, if there is a big consumer group with many consumers (e.g. ETL,
> MM, streams, etc) misconfigured the initial rebalance delay to 0, it may
> cause hundreds even thousands of rebalances occur back to back and will
> likely take quite a bit bandwidth. I am a little worried about the
> performance impact in that case. Although request quota might help to
> throttle the rebalance, that seems not the most ideal solution.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
> On Mon, Jul 17, 2017 at 2:02 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Becket:
> >
> > I think the problem is that when we have a single member joining an
> unknown
> > group for the first time ever, do we want to complete the rebalance
> > immediately or not; it does not matter if we want to "start" the
> rebalance,
> > since even for now if the group coordinator is in the SyncGroup phase
> > waiting for the consumers to send the SyncGroup requests, if it then
> > receives a JoinGroup request it will still cancel the current rebalance
> and
> > falls back to the beginning of the PrepareRebalance.
> >
> > So with the configurable delta, if it will indeed prevent the started
> > rebalance to complete, then the console consumer will still be affected;
> if
> > it will not prevent the started rebalance to complete, then we may still
> > get consecutive rebalances since the first rebalance will usually
> complete
> > very quick.
> >
> > I think the proposal for having the configuration on the client-side
> > instead of on the broker side does not mean that users now need to worry
> > about the config: with an default value of, say 0, as long as they do not
> > observe any consecutive rebalance issues they may never need to be aware
> of
> > such configs at all. And for some higher-level clients like Streams, we
> may
> > decide to change its default configs to be larger than 0 as it may be
> more
> > common to hit the issue.
> >
> >
> > Greg:
> >
> > Regarding notifying the users with too frequent rebalances, I think it
> > would be a better mechanism for users to monitor on a certain metric
> (say,
> > rebalance rate) than watching on the config? Under normal opration this
> > rebalance rate should be 0 with only a rare spike from time to time; if
> > there is continuous non-zero values for this metric then users can be
> > notified. And we can educate them about configuring their apps with the
> > recommended values in web docs correspondingly?
> >
> >
> > Guozhang
> >
> > On Thu, Jul 13, 2017 at 7:37 AM, Becket Qin <becket....@gmail.com>
> wrote:
> >
> > > I am a little hesitant to add the configuration to the client. It would
> > be
> > > more flexible but this seems not the thing that users should worry
> about
> > (I
> > > imagine many people would simply set backoff to 0 just for fast
> > rebalance).
> > > I am wondering if the following variant of the current solution will
> > > address the problem.
> > >
> > > 1. broker will start to rebalance immediately when the first member
> joins
> > > the group at T0.
> > >
> > > 2. If another member joins the group at T1 which is between T0 and T0 +
> > > delta (configurable), the broker will wait until T1 + delta then do the
> > > rebalance. Any additional member joining before the rebalance kicks off
> > > would result in the delay of the rebalance with the same extension
> logic
> > as
> > > we have now. We can also try some exponential back off if needed.
> > >
> > > This should help address the console consumer problem. Not sure if
> there
> > > are other cases that needs to be considered, though.
> > >
> > > Thanks,
> > >
> > > Jiangjie (Becket) Qin
> > >
> > > On Mon, Jul 10, 2017 at 5:28 PM, Greg Fodor <gfo...@gmail.com> wrote:
> > >
> > > > Found this thread after posting an alternative idea after we starting
> > > > hitting this issue ourselves for a job that has a lot of state stores
> > and
> > > > topic partitions. My suggestion was to have consumer groups have a
> > > > configurable minimum member count before consumption begins, but that
> > has
> > > > its own trade offs and benefits (maybe a different KIP.)
> > > >
> > > > One suggestion I had is maybe there is some relatively fool-proof
> > > heuristic
> > > > that can cause Kafka Streams to emit an INFO/WARN to the log to
> inform
> > > the
> > > > user of the configuration if it detects a rapid rebalance on startup
> > due
> > > to
> > > > new nodes joining? For example, if streams detects a rebalance,
> before
> > > > processors are initialized, that only add new nodes, if the
> > configuration
> > > > has not been overridden, write to the log?
> > > >
> > > >
> > > >
> > > > On Thu, Jun 8, 2017 at 2:56 PM, Guozhang Wang <wangg...@gmail.com>
> > > wrote:
> > > >
> > > > > Just recapping on client-side v.s. broker-side config: we did
> discuss
> > > > about
> > > > > adding this as a client-side config and bump up join-group request
> (I
> > > > think
> > > > > both Ismael and Ewen questioned about it) to include this
> configured
> > > > value
> > > > > to the broker. I cannot remember if there is any strong motivations
> > > > against
> > > > > going to the client-side config, except that we felt a default
> > non-zero
> > > > > value will benefit most users assuming they start with more than
> one
> > > > member
> > > > > in their group but only advanced users would really realize this
> > config
> > > > > existing and tune it themselves.
> > > > >
> > > > > I agree that we could re-consider it for the next release if we
> > observe
> > > > > that it is actually affecting more users than benefiting them.
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Wed, Jun 7, 2017 at 2:26 AM, Damian Guy <damian....@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi Jun/Ismael,
> > > > > >
> > > > > > Sounds good to me.
> > > > > >
> > > > > > Thanks,
> > > > > > Damian
> > > > > >
> > > > > > On Tue, 6 Jun 2017 at 23:08 Ismael Juma <ism...@juma.me.uk>
> wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > The console consumer issue also came up in a conversation I was
> > > > having
> > > > > > > recently. Seems like the config/server.properties change is a
> > > > > reasonable
> > > > > > > compromise given that we have other defaults that are for
> > > > development.
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Tue, Jun 6, 2017 at 10:59 PM, Jun Rao <j...@confluent.io>
> > wrote:
> > > > > > >
> > > > > > > > Hi, Everyone,
> > > > > > > >
> > > > > > > > Sorry for being late on this thread. I just came across this
> > > > thread.
> > > > > I
> > > > > > > have
> > > > > > > > a couple of concerns on this. (1) It seems the amount of
> delay
> > > will
> > > > > be
> > > > > > > > application specific. So, it seems that it's better for the
> > delay
> > > > to
> > > > > > be a
> > > > > > > > client side config instead of a server side one? (2) When
> > running
> > > > > > console
> > > > > > > > consumer in quickstart, a minimum of 3 sec delay seems to be
> a
> > > bad
> > > > > > > > experience for our users.
> > > > > > > >
> > > > > > > > Since we are getting late into the release cycle, it may be a
> > bit
> > > > too
> > > > > > > late
> > > > > > > > to make big changes in the 0.11 release. Perhaps we should at
> > > least
> > > > > > > > consider overriding the delay in config/server.properties to
> 0
> > to
> > > > > > improve
> > > > > > > > the quickstart experience?
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > >
> > > > > > > > Jun
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Apr 11, 2017 at 12:19 AM, Damian Guy <
> > > damian....@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Onur,
> > > > > > > > >
> > > > > > > > > It was in my previous email. But here it is again.
> > > > > > > > >
> > > > > > > > > ==============================
> ==============================
> > > > > > > > >
> > > > > > > > > 1. Better rebalance timing. We will try to rebalance only
> > when
> > > > all
> > > > > > the
> > > > > > > > > consumers in a group have joined. The challenge would be
> > > someone
> > > > > has
> > > > > > to
> > > > > > > > > define what does ALL consumers mean, it could either be a
> > time
> > > or
> > > > > > > number
> > > > > > > > of
> > > > > > > > > consumers, etc.
> > > > > > > > >
> > > > > > > > > 2. Avoid frequent rebalance. For example, if there are 100
> > > > > consumers
> > > > > > > in a
> > > > > > > > > group, today, in the worst case, we may end up with 100
> > > > rebalances
> > > > > > even
> > > > > > > > if
> > > > > > > > > all the consumers joined the group in a reasonably small
> > amount
> > > > of
> > > > > > > time.
> > > > > > > > > Frequent rebalance is also a bad thing for brokers.
> > > > > > > > >
> > > > > > > > > Having a client side configuration may solve problem 1
> better
> > > > > because
> > > > > > > > each
> > > > > > > > > consumer group can potentially configure their own timing.
> > > > However,
> > > > > > it
> > > > > > > > does
> > > > > > > > > not really prevent frequent rebalance in general because
> some
> > > of
> > > > > the
> > > > > > > > > consumers can be misconfigured. (This may have something to
> > do
> > > > with
> > > > > > > > KIP-124
> > > > > > > > > as well. But if quota is applied on the JoinGroup/SyncGroup
> > > > request
> > > > > > it
> > > > > > > > may
> > > > > > > > > cause some unwanted cascading effects.)
> > > > > > > > >
> > > > > > > > > Having a broker side configuration may result in less
> > > flexibility
> > > > > for
> > > > > > > > each
> > > > > > > > > consumer group, but it can prevent frequent rebalance
> > better. I
> > > > > think
> > > > > > > > with
> > > > > > > > > some reasonable design, the rebalance timing issue can be
> > > > resolved
> > > > > on
> > > > > > > the
> > > > > > > > > broker side as well. Matthias had a good point on extending
> > the
> > > > > delay
> > > > > > > > when
> > > > > > > > > a new consumer joins a group (we actually did something
> > similar
> > > > to
> > > > > > > batch
> > > > > > > > > ISR change propagation). For example, let's say on the
> broker
> > > > side,
> > > > > > we
> > > > > > > > will
> > > > > > > > > always delay 2 seconds each time we see a new consumer
> > joining
> > > a
> > > > > > > consumer
> > > > > > > > > group. This would probably work for most of the consumer
> > groups
> > > > and
> > > > > > > will
> > > > > > > > > also limit the rebalance frequency to protect the brokers.
> > > > > > > > >
> > > > > > > > > I am not sure about the streams use case here, but if
> > something
> > > > > like
> > > > > > 2
> > > > > > > > > seconds of delay is acceptable for streams, I would prefer
> > > adding
> > > > > the
> > > > > > > > > configuration to the broker so that we can address both
> > > problems.
> > > > > > > > >
> > > > > > > > > On Thu, 6 Apr 2017 at 17:11 Onur Karaman <
> > > > > > onurkaraman.apa...@gmail.com
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Damian.
> > > > > > > > > >
> > > > > > > > > > Can you copy the point Becket made earlier that you say
> > isn't
> > > > > > > > addressed?
> > > > > > > > > >
> > > > > > > > > > On Thu, Apr 6, 2017 at 2:51 AM, Damian Guy <
> > > > damian....@gmail.com
> > > > > >
> > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Thanks all, the Vote is now closed and the KIP has been
> > > > > accepted
> > > > > > > > with 9
> > > > > > > > > > +1s
> > > > > > > > > > >
> > > > > > > > > > > 3 binding::
> > > > > > > > > > > Guozhang,
> > > > > > > > > > > Jason,
> > > > > > > > > > > Ismael
> > > > > > > > > > >
> > > > > > > > > > > 6 non-binding:
> > > > > > > > > > > Bill,
> > > > > > > > > > > Eno,
> > > > > > > > > > > Mathieu,
> > > > > > > > > > > Matthias,
> > > > > > > > > > > Dong,
> > > > > > > > > > > Mickael
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Damian
> > > > > > > > > > >
> > > > > > > > > > > On Thu, 6 Apr 2017 at 09:26 Ismael Juma <
> > ism...@juma.me.uk
> > > >
> > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Thanks for the KIP, +1 (binding).
> > > > > > > > > > > >
> > > > > > > > > > > > Ismael
> > > > > > > > > > > >
> > > > > > > > > > > > On Thu, Mar 30, 2017 at 8:55 PM, Jason Gustafson <
> > > > > > > > ja...@confluent.io
> > > > > > > > > >
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 Thanks for the KIP!
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 30, 2017 at 12:51 PM, Guozhang Wang <
> > > > > > > > > wangg...@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +1
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Sorry about the previous email, Gmail seems be
> > > > collapsing
> > > > > > > them
> > > > > > > > > > into a
> > > > > > > > > > > > > > single thread on my inbox.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 11:34 AM, Guozhang Wang <
> > > > > > > > > > wangg...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Damian, could you create a new thread for the
> > > voting
> > > > > > > process?
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Thanks!
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Guozhang
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Thu, Mar 30, 2017 at 10:33 AM, Bill Bejeck <
> > > > > > > > > bbej...@gmail.com
> > > > > > > > > > >
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> +1(non-binding)
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> On Thu, Mar 30, 2017 at 1:30 PM, Eno Thereska
> <
> > > > > > > > > > > > eno.there...@gmail.com
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >> > +1 (non binding)
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> > Thanks
> > > > > > > > > > > > > > >> > Eno
> > > > > > > > > > > > > > >> > > On 30 Mar 2017, at 18:01, Matthias J. Sax
> <
> > > > > > > > > > > > matth...@confluent.io>
> > > > > > > > > > > > > > >> wrote:
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > +1
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> > > On 3/30/17 3:46 AM, Damian Guy wrote:
> > > > > > > > > > > > > > >> > >> Hi All,
> > > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > > >> > >> I'd like to start the voting thread on
> > > KIP-134:
> > > > > > > > > > > > > > >> > >>
> > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > > > > > > > > > > >> > 134%3A+Delay+initial+consumer+
> group+rebalance
> > > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > > >> > >> Thanks,
> > > > > > > > > > > > > > >> > >> Damian
> > > > > > > > > > > > > > >> > >>
> > > > > > > > > > > > > > >> > >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >> >
> > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > --
> > > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > --
> > > > > > > > > > > > > > -- Guozhang
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > -- Guozhang
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Reply via email to