Ismael,

It seems like given 3.0 is a breaking release, we have to rely on users
being aware of this and responsible
enough to read the upgrade guide. Otherwise we could never ever make any
breaking changes beyond just
removing deprecated APIs or other compilation-breaking errors that would be
immediately visible, no?

That said, obviously it's better to have a circuit-breaker that will fail
fast in case of a user misconfiguration
rather than silently corrupting the consumer group state -- eg for two
consumers to overlap in their ownership
of the same partition(s). We could definitely implement this, and now that
I think about it this might solve a
related problem in KAFKA-12477
<https://issues.apache.org/jira/browse/KAFKA-12477>. We just add a new
field to the Assignment in which the group leader
indicates whether it's on a recent enough version to understand cooperative
rebalancing. If an upgraded member
joins the group, it'll only be allowed to start following the new
rebalancing protocol after receiving the go-ahead
from the group leader.

If we do go ahead and add this new field in the Assignment then I'm pretty
confident we can reduce the number
of required rolling bounces to just one with KAFKA-12477
<https://issues.apache.org/jira/browse/KAFKA-12477>. In that case we should
be in much better shape to
feel good about changing the default to the CooperativeStickyAssignor. How
does that sound?

To be clear, I'm not proposing we do this as part of KIP-726. Here's my
take:

Let's pause this KIP while I work on making these two improvements in
KAFKA-12477 <https://issues.apache.org/jira/browse/KAFKA-12477>. Once I can
confirm the
short-circuit and single rolling bounce will be available for 3.0, I'll
report back on this thread. Then we can move
forward with this KIP again.

Thoughts?
Sophie

On Mon, Mar 29, 2021 at 12:01 AM Luke Chen <show...@gmail.com> wrote:

> Hi Ismael,
> Thanks for your good question. Answer them below:
> *1. Are we saying that every consumer upgraded would have to follow the
> complex path described in the KIP? *
> --> We suggest that every consumer did these 2 steps of rolling upgrade.
> And after KAFKA-12477 <https://issues.apache.org/jira/browse/KAFKA-12477>
> is completed, it can be reduced to 1 rolling upgrade.
>
> *2. what happens if they don't read the instructions and upgrade as they
> have in the past?*
> --> The reason we want 2 steps of rolling upgrade is that we want to avoid
> the situation where leader is on old byte-code and only recognize "eager",
> but due to compatibility would still be able to deserialize the new
> protocol data from newer versioned members, and hence just go ahead and do
> the assignment while new versioned members did not revoke their partitions
> before joining the group.
>
> But I'd say, the new default assignor "CooperativeStickyAssignor" was
> already introduced in V2.4.0, and it should be long enough for user to
> upgrade to the new byte-code to recognize the "cooperative" protocol.
>
> What do you think?
>
> Thank you.
> Luke
>
> On Mon, Mar 29, 2021 at 12:14 PM Ismael Juma <ism...@juma.me.uk> wrote:
>
> > Thanks for the KIP. Are we saying that every consumer upgraded would have
> > to follow the complex path described in the KIP? Also, what happens if
> they
> > don't read the instructions and upgrade as they have in the past?
> >
> > Ismael
> >
> > On Fri, Mar 26, 2021, 1:53 AM Luke Chen <show...@gmail.com> wrote:
> >
> > > Hi everyone,
> > > <Update the subject>
> > >
> > > I'd like to discuss the following proposal to make the
> > > CooperativeStickyAssignor as the default assignor.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
> > >
> > > Any comments are welcomed.
> > >
> > > Thank you.
> > > Luke
> > >
> >
>

Reply via email to