Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Jason Gustafson Mon, 10 Dec 2018 11:24:39 -0800

Hey Stanislav,

Just to clarify, I think what you're suggesting is something like this in
order to gracefully shrink the group:


1. Transition the group to PREPARING_REBALANCE. No members are kicked out.
2. Continue to allow offset commits and heartbeats for all current members.
3. Allow the first n members that send JoinGroup to stay in the group, but
wait for the JoinGroup (or session timeout) from all active members before
finishing the rebalance.

So basically we try to give the current members an opportunity to finish
work, but we prevent some of them from rejoining after the rebalance
completes. It sounds reasonable if I've understood correctly.

Thanks,
Jason



On Fri, Dec 7, 2018 at 6:47 AM Boyang Chen <bche...@outlook.com> wrote:

> Yep, LGTM on my side. Thanks Stanislav!
> ________________________________
> From: Stanislav Kozlovski <stanis...@confluent.io>
> Sent: Friday, December 7, 2018 8:51 PM
> To: dev@kafka.apache.org
> Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> metadata growth
>
> Hi,
>
> We discussed this offline with Boyang and figured that it's best to not
> wait on the Cooperative Rebalancing proposal. Our thinking is that we can
> just force a rebalance from the broker, allowing consumers to commit
> offsets if their rebalanceListener is configured correctly.
> When rebalancing improvements are implemented, we assume that they would
> improve KIP-389's behavior as well as the normal rebalance scenarios
>
> On Wed, Dec 5, 2018 at 12:09 PM Boyang Chen <bche...@outlook.com> wrote:
>
> > Hey Stanislav,
> >
> > thanks for the question! `Trivial rebalance` means "we don't start
> > reassignment right now, but you need to know it's coming soon
> > and you should start preparation".
> >
> > An example KStream use case is that before actually starting to shrink
> the
> > consumer group, we need to
> > 1. partition the consumer group into two subgroups, where one will be
> > offline soon and the other will keep serving;
> > 2. make sure the states associated with near-future offline consumers are
> > successfully replicated on the serving ones.
> >
> > As I have mentioned shrinking the consumer group is pretty much
> equivalent
> > to group scaling down, so we could think of this
> > as an add-on use case for cluster scaling. So my understanding is that
> the
> > KIP-389 could be sequenced within our cooperative rebalancing<
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FIncremental%2BCooperative%2BRebalancing%253A%2BSupport%2Band%2BPolicies&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=N%2BVJsEYYwTx6k0uz8%2BKvL9tt3jLECokyAA%2B2mWyyOyA%3D&amp;reserved=0
> > >
> > proposal.
> >
> > Let me know if this makes sense.
> >
> > Best,
> > Boyang
> > ________________________________
> > From: Stanislav Kozlovski <stanis...@confluent.io>
> > Sent: Wednesday, December 5, 2018 5:52 PM
> > To: dev@kafka.apache.org
> > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > metadata growth
> >
> > Hey Boyang,
> >
> > I think we still need to take care of group shrinkage because even if
> users
> > change the config value we cannot guarantee that all consumer groups
> would
> > have been manually shrunk.
> >
> > Regarding 2., I agree that forcefully triggering a rebalance might be the
> > most intuitive way to handle the situation.
> > What does a "trivial rebalance" mean? Sorry, I'm not familiar with the
> > term.
> > I was thinking that maybe we could force a rebalance, which would cause
> > consumers to commit their offsets (given their rebalanceListener is
> > configured correctly) and subsequently reject some of the incoming
> > `joinGroup` requests. Does that sound like it would work?
> >
> > On Wed, Dec 5, 2018 at 1:13 AM Boyang Chen <bche...@outlook.com> wrote:
> >
> > > Hey Stanislav,
> > >
> > > I read the latest KIP and saw that we already changed the default value
> > to
> > > -1. Do
> > > we still need to take care of the consumer group shrinking when doing
> the
> > > upgrade?
> > >
> > > However this is an interesting topic that worth discussing. Although
> > > rolling
> > > upgrade is fine, `consumer.group.max.size` could always have conflict
> > with
> > > the current
> > > consumer group size which means we need to adhere to one source of
> truth.
> > >
> > > 1.Choose the current group size, which means we never interrupt the
> > > consumer group until
> > > it transits to PREPARE_REBALANCE. And we keep track of how many join
> > group
> > > requests
> > > we have seen so far during PREPARE_REBALANCE. After reaching the
> consumer
> > > cap,
> > > we start to inform over provisioned consumers that you should send
> > > LeaveGroupRequest and
> > > fail yourself. Or with what Mayuresh proposed in KIP-345, we could mark
> > > extra members
> > > as hot backup and rebalance without them.
> > >
> > > 2.Choose the `consumer.group.max.size`. I feel incremental rebalancing
> > > (you proposed) could be of help here.
> > > When a new cap is enforced, leader should be notified. If the current
> > > group size is already over limit, leader
> > > shall trigger a trivial rebalance to shuffle some topic partitions and
> > let
> > > a subset of consumers prepare the ownership
> > > transition. Until they are ready, we trigger a real rebalance to remove
> > > over-provisioned consumers. It is pretty much
> > > equivalent to `how do we scale down the consumer group without
> > > interrupting the current processing`.
> > >
> > > I personally feel inclined to 2 because we could kill two birds with
> one
> > > stone in a generic way. What do you think?
> > >
> > > Boyang
> > > ________________________________
> > > From: Stanislav Kozlovski <stanis...@confluent.io>
> > > Sent: Monday, December 3, 2018 8:35 PM
> > > To: dev@kafka.apache.org
> > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to cap member
> > > metadata growth
> > >
> > > Hi Jason,
> > >
> > > > 2. Do you think we should make this a dynamic config?
> > > I'm not sure. Looking at the config from the perspective of a
> > prescriptive
> > > config, we may get away with not updating it dynamically.
> > > But in my opinion, it always makes sense to have a config be
> dynamically
> > > configurable. As long as we limit it to being a cluster-wide config, we
> > > should be fine.
> > >
> > > > 1. I think it would be helpful to clarify the details on how the
> > > coordinator will shrink the group. It will need to choose which members
> > to
> > > remove. Are we going to give current members an opportunity to commit
> > > offsets before kicking them from the group?
> > >
> > > This turns out to be somewhat tricky. I think that we may not be able
> to
> > > guarantee that consumers don't process a message twice.
> > > My initial approach was to do as much as we could to let consumers
> commit
> > > offsets.
> > >
> > > I was thinking that we mark a group to be shrunk, we could keep a map
> of
> > > consumer_id->boolean indicating whether they have committed offsets. I
> > then
> > > thought we could delay the rebalance until every consumer commits (or
> > some
> > > time passes).
> > > In the meantime, we would block all incoming fetch calls (by either
> > > returning empty records or a retriable error) and we would continue to
> > > accept offset commits (even twice for a single consumer)
> > >
> > > I see two problems with this approach:
> > > * We have async offset commits, which implies that we can receive fetch
> > > requests before the offset commit req has been handled. i.e consmer
> sends
> > > fetchReq A, offsetCommit B, fetchReq C - we may receive A,C,B in the
> > > broker. Meaning we could have saved the offsets for B but rebalance
> > before
> > > the offsetCommit for the offsets processed in C come in.
> > > * KIP-392 Allow consumers to fetch from closest replica
> > > <
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-392%253A%2BAllow%2Bconsumers%2Bto%2Bfetch%2Bfrom%2Bclosest%2Breplica&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=a2G0cWk7Saia9OEz4UfxvKBQtzP25Zi5cCb5jWx9mZY%3D&amp;reserved=0
> > > >
> > > would
> > > make it significantly harder to block poll() calls on consumers whose
> > > groups are being shrunk. Even if we implemented a solution, the same
> race
> > > condition noted above seems to apply and probably others
> > >
> > >
> > > Given those constraints, I think that we can simply mark the group as
> > > `PreparingRebalance` with a rebalanceTimeout of the server setting `
> > > group.max.session.timeout.ms`. That's a bit long by default (5
> minutes)
> > > but
> > > I can't seem to come up with a better alternative
> > >
> > > I'm interested in hearing your thoughts.
> > >
> > > Thanks,
> > > Stanislav
> > >
> > > On Fri, Nov 30, 2018 at 8:38 AM Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > Hey Stanislav,
> > > >
> > > > What do you think about the use case I mentioned in my previous reply
> > > about
> > > > > a more resilient self-service Kafka? I believe the benefit there is
> > > > bigger.
> > > >
> > > >
> > > > I see this config as analogous to the open file limit. Probably this
> > > limit
> > > > was intended to be prescriptive at some point about what was deemed a
> > > > reasonable number of open files for an application. But mostly people
> > > treat
> > > > it as an annoyance which they have to work around. If it happens to
> be
> > > hit,
> > > > usually you just increase it because it is not tied to an actual
> > resource
> > > > constraint. However, occasionally hitting the limit does indicate an
> > > > application bug such as a leak, so I wouldn't say it is useless.
> > > Similarly,
> > > > the issue in KAFKA-7610 was a consumer leak and having this limit
> would
> > > > have allowed the problem to be detected before it impacted the
> cluster.
> > > To
> > > > me, that's the main benefit. It's possible that it could be used
> > > > prescriptively to prevent poor usage of groups, but like the open
> file
> > > > limit, I suspect administrators will just set it large enough that
> > users
> > > > are unlikely to complain.
> > > >
> > > > Anyway, just a couple additional questions:
> > > >
> > > > 1. I think it would be helpful to clarify the details on how the
> > > > coordinator will shrink the group. It will need to choose which
> members
> > > to
> > > > remove. Are we going to give current members an opportunity to commit
> > > > offsets before kicking them from the group?
> > > >
> > > > 2. Do you think we should make this a dynamic config?
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Nov 28, 2018 at 2:42 AM Stanislav Kozlovski <
> > > > stanis...@confluent.io>
> > > > wrote:
> > > >
> > > > > Hi Jason,
> > > > >
> > > > > You raise some very valid points.
> > > > >
> > > > > > The benefit of this KIP is probably limited to preventing
> "runaway"
> > > > > consumer groups due to leaks or some other application bug
> > > > > What do you think about the use case I mentioned in my previous
> reply
> > > > about
> > > > > a more resilient self-service Kafka? I believe the benefit there is
> > > > bigger
> > > > >
> > > > > * Default value
> > > > > You're right, we probably do need to be conservative. Big consumer
> > > groups
> > > > > are considered an anti-pattern and my goal was to also hint at this
> > > > through
> > > > > the config's default. Regardless, it is better to not have the
> > > potential
> > > > to
> > > > > break applications with an upgrade.
> > > > > Choosing between the default of something big like 5000 or an
> opt-in
> > > > > option, I think we should go with the *disabled default option*
> > (-1).
> > > > > The only benefit we would get from a big default of 5000 is default
> > > > > protection against buggy/malicious applications that hit the
> > KAFKA-7610
> > > > > issue.
> > > > > While this KIP was spawned from that issue, I believe its value is
> > > > enabling
> > > > > the possibility of protection and helping move towards a more
> > > > self-service
> > > > > Kafka. I also think that a default value of 5000 might be
> misleading
> > to
> > > > > users and lead them to think that big consumer groups (> 250) are a
> > > good
> > > > > thing.
> > > > >
> > > > > The good news is that KAFKA-7610 should be fully resolved and the
> > > > rebalance
> > > > > protocol should, in general, be more solid after the planned
> > > improvements
> > > > > in KIP-345 and KIP-394.
> > > > >
> > > > > * Handling bigger groups during upgrade
> > > > > I now see that we store the state of consumer groups in the log and
> > > why a
> > > > > rebalance isn't expected during a rolling upgrade.
> > > > > Since we're going with the default value of the max.size being
> > > disabled,
> > > > I
> > > > > believe we can afford to be more strict here.
> > > > > During state reloading of a new Coordinator with a defined
> > > max.group.size
> > > > > config, I believe we should *force* rebalances for groups that
> exceed
> > > the
> > > > > configured size. Then, only some consumers will be able to join and
> > the
> > > > max
> > > > > size invariant will be satisfied.
> > > > >
> > > > > I updated the KIP with a migration plan, rejected alternatives and
> > the
> > > > new
> > > > > default value.
> > > > >
> > > > > Thanks,
> > > > > Stanislav
> > > > >
> > > > > On Tue, Nov 27, 2018 at 5:25 PM Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hey Stanislav,
> > > > > >
> > > > > > Clients will then find that coordinator
> > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > since
> > > > the
> > > > > > > cache of active consumers is not stored outside the
> Coordinator's
> > > > > memory.
> > > > > > > (please do say if that is incorrect)
> > > > > >
> > > > > >
> > > > > > Groups do not typically rebalance after a coordinator change. You
> > > could
> > > > > > potentially force a rebalance if the group is too big and kick
> out
> > > the
> > > > > > slowest members or something. A more graceful solution is
> probably
> > to
> > > > > just
> > > > > > accept the current size and prevent it from getting bigger. We
> > could
> > > > log
> > > > > a
> > > > > > warning potentially.
> > > > > >
> > > > > > My thinking is that we should abstract away from conserving
> > resources
> > > > and
> > > > > > > focus on giving control to the broker. The issue that spawned
> > this
> > > > KIP
> > > > > > was
> > > > > > > a memory problem but I feel this change is useful in a more
> > general
> > > > > way.
> > > > > >
> > > > > >
> > > > > > So you probably already know why I'm asking about this. For
> > consumer
> > > > > groups
> > > > > > anyway, resource usage would typically be proportional to the
> > number
> > > of
> > > > > > partitions that a group is reading from and not the number of
> > > members.
> > > > > For
> > > > > > example, consider the memory use in the offsets cache. The
> benefit
> > of
> > > > > this
> > > > > > KIP is probably limited to preventing "runaway" consumer groups
> due
> > > to
> > > > > > leaks or some other application bug. That still seems useful
> > though.
> > > > > >
> > > > > > I completely agree with this and I *ask everybody to chime in
> with
> > > > > opinions
> > > > > > > on a sensible default value*.
> > > > > >
> > > > > >
> > > > > > I think we would have to be very conservative. The group protocol
> > is
> > > > > > generic in some sense, so there may be use cases we don't know of
> > > where
> > > > > > larger groups are reasonable. Probably we should make this an
> > opt-in
> > > > > > feature so that we do not risk breaking anyone's application
> after
> > an
> > > > > > upgrade. Either that, or use a very high default like 5,000.
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > > On Tue, Nov 27, 2018 at 3:27 AM Stanislav Kozlovski <
> > > > > > stanis...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Jason and Boyang, those were important comments
> > > > > > >
> > > > > > > > One suggestion I have is that it would be helpful to put your
> > > > > reasoning
> > > > > > > on deciding the current default value. For example, in certain
> > use
> > > > > cases
> > > > > > at
> > > > > > > Pinterest we are very likely to have more consumers than 250
> when
> > > we
> > > > > > > configure 8 stream instances with 32 threads.
> > > > > > > > For the effectiveness of this KIP, we should encourage people
> > to
> > > > > > discuss
> > > > > > > their opinions on the default setting and ideally reach a
> > > consensus.
> > > > > > >
> > > > > > > I completely agree with this and I *ask everybody to chime in
> > with
> > > > > > opinions
> > > > > > > on a sensible default value*.
> > > > > > > My thought process was that in the current model rebalances in
> > > large
> > > > > > groups
> > > > > > > are more costly. I imagine most use cases in most Kafka users
> do
> > > not
> > > > > > > require more than 250 consumers.
> > > > > > > Boyang, you say that you are "likely to have... when we..." -
> do
> > > you
> > > > > have
> > > > > > > systems running with so many consumers in a group or are you
> > > planning
> > > > > > to? I
> > > > > > > guess what I'm asking is whether this has been tested in
> > production
> > > > > with
> > > > > > > the current rebalance model (ignoring KIP-345)
> > > > > > >
> > > > > > > >  Can you clarify the compatibility impact here? What
> > > > > > > > will happen to groups that are already larger than the max
> > size?
> > > > > > > This is a very important question.
> > > > > > > From my current understanding, when a coordinator broker gets
> > shut
> > > > > > > down during a cluster rolling upgrade, a replica will take
> > > leadership
> > > > > of
> > > > > > > the `__offset_commits` partition. Clients will then find that
> > > > > coordinator
> > > > > > > and send `joinGroup` on it, effectively rebuilding the group,
> > since
> > > > the
> > > > > > > cache of active consumers is not stored outside the
> Coordinator's
> > > > > memory.
> > > > > > > (please do say if that is incorrect)
> > > > > > > Then, I believe that working as if this is a new group is a
> > > > reasonable
> > > > > > > approach. Namely, fail joinGroups when the max.size is
> exceeded.
> > > > > > > What do you guys think about this? (I'll update the KIP after
> we
> > > > settle
> > > > > > on
> > > > > > > a solution)
> > > > > > >
> > > > > > > >  Also, just to be clear, the resource we are trying to
> conserve
> > > > here
> > > > > is
> > > > > > > what? Memory?
> > > > > > > My thinking is that we should abstract away from conserving
> > > resources
> > > > > and
> > > > > > > focus on giving control to the broker. The issue that spawned
> > this
> > > > KIP
> > > > > > was
> > > > > > > a memory problem but I feel this change is useful in a more
> > general
> > > > > way.
> > > > > > It
> > > > > > > limits the control clients have on the cluster and helps Kafka
> > > > become a
> > > > > > > more self-serving system. Admin/Ops teams can better control
> the
> > > > impact
> > > > > > > application developers can have on a Kafka cluster with this
> > change
> > > > > > >
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 26, 2018 at 8:00 PM Jason Gustafson <
> > > ja...@confluent.io>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Stanislav,
> > > > > > > >
> > > > > > > > Thanks for the KIP. Can you clarify the compatibility impact
> > > here?
> > > > > What
> > > > > > > > will happen to groups that are already larger than the max
> > size?
> > > > > Also,
> > > > > > > just
> > > > > > > > to be clear, the resource we are trying to conserve here is
> > what?
> > > > > > Memory?
> > > > > > > >
> > > > > > > > -Jason
> > > > > > > >
> > > > > > > > On Mon, Nov 26, 2018 at 2:44 AM Boyang Chen <
> > bche...@outlook.com
> > > >
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks Stanislav for the update! One suggestion I have is
> > that
> > > it
> > > > > > would
> > > > > > > > be
> > > > > > > > > helpful to put your
> > > > > > > > >
> > > > > > > > > reasoning on deciding the current default value. For
> example,
> > > in
> > > > > > > certain
> > > > > > > > > use cases at Pinterest we are very likely
> > > > > > > > >
> > > > > > > > > to have more consumers than 250 when we configure 8 stream
> > > > > instances
> > > > > > > with
> > > > > > > > > 32 threads.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > For the effectiveness of this KIP, we should encourage
> people
> > > to
> > > > > > > discuss
> > > > > > > > > their opinions on the default setting and ideally reach a
> > > > > consensus.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Boyang
> > > > > > > > >
> > > > > > > > > ________________________________
> > > > > > > > > From: Stanislav Kozlovski <stanis...@confluent.io>
> > > > > > > > > Sent: Monday, November 26, 2018 6:14 PM
> > > > > > > > > To: dev@kafka.apache.org
> > > > > > > > > Subject: Re: [Discuss] KIP-389: Enforce group.max.size to
> cap
> > > > > member
> > > > > > > > > metadata growth
> > > > > > > > >
> > > > > > > > > Hey everybody,
> > > > > > > > >
> > > > > > > > > It's been a week since this KIP and not much discussion has
> > > been
> > > > > > made.
> > > > > > > > > I assume that this is a straight forward change and I will
> > > open a
> > > > > > > voting
> > > > > > > > > thread in the next couple of days if nobody has anything to
> > > > > suggest.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > > > On Thu, Nov 22, 2018 at 12:56 PM Stanislav Kozlovski <
> > > > > > > > > stanis...@confluent.io>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Greetings everybody,
> > > > > > > > > >
> > > > > > > > > > I have enriched the KIP a bit with a bigger Motivation
> > > section
> > > > > and
> > > > > > > also
> > > > > > > > > > renamed it.
> > > > > > > > > > KIP:
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BIntroduce%2Ba%2Bconfigurable%2Bconsumer%2Bgroup%2Bsize%2Blimit&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=1HOujECau6m8AoYt8OMUbpawSjwHg1Z3CxJQMSQYk6A%3D&amp;reserved=0
> > > > > > > > > >
> > > > > > > > > > I'm looking forward to discussions around it.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > > > On Tue, Nov 20, 2018 at 1:47 PM Stanislav Kozlovski <
> > > > > > > > > > stanis...@confluent.io> wrote:
> > > > > > > > > >
> > > > > > > > > >> Hey there everybody,
> > > > > > > > > >>
> > > > > > > > > >> Thanks for the introduction Boyang. I appreciate the
> > effort
> > > > you
> > > > > > are
> > > > > > > > > >> putting into improving consumer behavior in Kafka.
> > > > > > > > > >>
> > > > > > > > > >> @Matt
> > > > > > > > > >> I also believe the default value is high. In my opinion,
> > we
> > > > > should
> > > > > > > aim
> > > > > > > > > to
> > > > > > > > > >> a default cap around 250. This is because in the current
> > > model
> > > > > any
> > > > > > > > > consumer
> > > > > > > > > >> rebalance is disrupting to every consumer. The bigger
> the
> > > > group,
> > > > > > the
> > > > > > > > > longer
> > > > > > > > > >> this period of disruption.
> > > > > > > > > >>
> > > > > > > > > >> If you have such a large consumer group, chances are
> that
> > > your
> > > > > > > > > >> client-side logic could be structured better and that
> you
> > > are
> > > > > not
> > > > > > > > using
> > > > > > > > > the
> > > > > > > > > >> high number of consumers to achieve high throughput.
> > > > > > > > > >> 250 can still be considered of a high upper bound, I
> > believe
> > > > in
> > > > > > > > practice
> > > > > > > > > >> users should aim to not go over 100 consumers per
> consumer
> > > > > group.
> > > > > > > > > >>
> > > > > > > > > >> In regards to the cap being global/per-broker, I think
> > that
> > > we
> > > > > > > should
> > > > > > > > > >> consider whether we want it to be global or *per-topic*.
> > For
> > > > the
> > > > > > > time
> > > > > > > > > >> being, I believe that having it per-topic with a global
> > > > default
> > > > > > > might
> > > > > > > > be
> > > > > > > > > >> the best situation. Having it global only seems a bit
> > > > > restricting
> > > > > > to
> > > > > > > > me
> > > > > > > > > and
> > > > > > > > > >> it never hurts to support more fine-grained
> > configurability
> > > > > (given
> > > > > > > > it's
> > > > > > > > > the
> > > > > > > > > >> same config, not a new one being introduced).
> > > > > > > > > >>
> > > > > > > > > >> On Tue, Nov 20, 2018 at 11:32 AM Boyang Chen <
> > > > > bche...@outlook.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >>> Thanks Matt for the suggestion! I'm still open to any
> > > > > suggestion
> > > > > > to
> > > > > > > > > >>> change the default value. Meanwhile I just want to
> point
> > > out
> > > > > that
> > > > > > > > this
> > > > > > > > > >>> value is a just last line of defense, not a real
> scenario
> > > we
> > > > > > would
> > > > > > > > > expect.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> In the meanwhile, I discussed with Stanislav and he
> would
> > > be
> > > > > > > driving
> > > > > > > > > the
> > > > > > > > > >>> 389 effort from now on. Stanislav proposed the idea in
> > the
> > > > > first
> > > > > > > > place
> > > > > > > > > and
> > > > > > > > > >>> had already come up a draft design, while I will keep
> > > > focusing
> > > > > on
> > > > > > > > > KIP-345
> > > > > > > > > >>> effort to ensure solving the edge case described in the
> > > JIRA<
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FKAFKA-7610&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=QVt3jtDN2dPQ4xZOTAyJGszCXiKYwXHcmbsxmcpph2w%3D&amp;reserved=0
> > > > > > > > > >.
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> Thank you Stanislav for making this happen!
> > > > > > > > > >>>
> > > > > > > > > >>>
> > > > > > > > > >>> Boyang
> > > > > > > > > >>>
> > > > > > > > > >>> ________________________________
> > > > > > > > > >>> From: Matt Farmer <m...@frmr.me>
> > > > > > > > > >>> Sent: Tuesday, November 20, 2018 10:24 AM
> > > > > > > > > >>> To: dev@kafka.apache.org
> > > > > > > > > >>> Subject: Re: [Discuss] KIP-389: Enforce group.max.size
> to
> > > cap
> > > > > > > member
> > > > > > > > > >>> metadata growth
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks for the KIP.
> > > > > > > > > >>>
> > > > > > > > > >>> Will this cap be a global cap across the entire cluster
> > or
> > > > per
> > > > > > > > broker?
> > > > > > > > > >>>
> > > > > > > > > >>> Either way the default value seems a bit high to me,
> but
> > > that
> > > > > > could
> > > > > > > > > just
> > > > > > > > > >>> be
> > > > > > > > > >>> from my own usage patterns. I'd have probably started
> > with
> > > > 500
> > > > > or
> > > > > > > 1k
> > > > > > > > > but
> > > > > > > > > >>> could be easily convinced that's wrong.
> > > > > > > > > >>>
> > > > > > > > > >>> Thanks,
> > > > > > > > > >>> Matt
> > > > > > > > > >>>
> > > > > > > > > >>> On Mon, Nov 19, 2018 at 8:51 PM Boyang Chen <
> > > > > bche...@outlook.com
> > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > >>>
> > > > > > > > > >>> > Hey folks,
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > I would like to start a discussion on KIP-389:
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>>
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://nam03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FKAFKA%2FKIP-389%253A%2BEnforce%2Bgroup.max.size%2Bto%2Bcap%2Bmember%2Bmetadata%2Bgrowth&amp;data=02%7C01%7C%7C9b6fddd2f2be41ce39c308d65c42c821%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636797839221710310&amp;sdata=FcuCA6ckiid0dsf41upRumrID8r7BGYS7lx1OItHT88%3D&amp;reserved=0
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > This is a pretty simple change to cap the consumer
> > group
> > > > size
> > > > > > for
> > > > > > > > > >>> broker
> > > > > > > > > >>> > stability. Give me your valuable feedback when you
> got
> > > > time.
> > > > > > > > > >>> >
> > > > > > > > > >>> >
> > > > > > > > > >>> > Thank you!
> > > > > > > > > >>> >
> > > > > > > > > >>>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Best,
> > > > > > > > > >> Stanislav
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Best,
> > > > > > > > > > Stanislav
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Best,
> > > > > > > > > Stanislav
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best,
> > > > > > > Stanislav
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best,
> > > > > Stanislav
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Stanislav
> > >
> >
> >
> > --
> > Best,
> > Stanislav
> >
>
>
> --
> Best,
> Stanislav
>

Re: [Discuss] KIP-389: Enforce group.max.size to cap member metadata growth

Reply via email to