Ok I'm still fleshing out all the details of KAFKA-12477 but I think we can
simplify some things a bit, and avoid
any kind of "fail-fast" which will require user intervention. In fact I
think we can avoid requiring the user to make
any changes at all for KIP-726, so we don't have to worry about whether
they actually read our documentation:

Instead of making ["cooperative-sticky"] the default, we change the default
to ["cooperative-sticky", "range"].
Since "range" is the old default, this is equivalent to the first rolling
bounce of the safe upgrade path in KIP-429.

Of course this also means that under the current protocol selection
mechanism we won't actually upgrade to
cooperative rebalancing with the default assignor. But that's where
KAFKA-12477 will come in.

@Guozhang Wang <guozh...@confluent.io>  I'll get back to you with a
concrete proposal and answer your questions, I just want to point out
that it's possible to side-step the risk of users shooting themselves in
the foot (well, at least in this one specific case,
obviously they always find a way)

On Tue, Mar 30, 2021 at 10:37 AM Guozhang Wang <wangg...@gmail.com> wrote:

> Hi Sophie,
>
> My question is more related to KAFKA-12477, but since your latest replies
> are on this thread I figured we can follow-up on the same venue. Just so I
> understand your latest comments above about the approach:
>
> * I think, we would need to persist this decision so that the group would
> never go back to the eager protocol, this bit would be written to the
> internal topic's assignment message. Is that correct?
> * Maybe you can describe the steps, after the group has decided to move
> forward with cooperative protocols, when:
> 1) a new member joined the group with the old version, and hence only
> recognized eager protocol and executing the eager protocol with its first
> rebalance, what would happen.
> 2) in addition to 1), the new member joined the group with the old version
> and only recognized the old subscription format, and was selected as the
> leader, what would happen.
>
> Guozhang
>
>
>
>
> On Mon, Mar 29, 2021 at 10:30 PM Luke Chen <show...@gmail.com> wrote:
>
> > Hi Sophie & Ismael,
> > Thank you for your feedback.
> > No problem, let's pause this KIP and wait for this improvement:
> KAFKA-12477
> > <https://issues.apache.org/jira/browse/KAFKA-12477>.
> >
> > Stay tuned :)
> >
> > Thank you.
> > Luke
> >
> > On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma <ism...@juma.me.uk> wrote:
> >
> > > Hi Sophie,
> > >
> > > I didn't analyze the KIP in detail, but the two suggestions you
> mentioned
> > > sound like great improvements.
> > >
> > > A bit more context: breaking changes for a widely used product like
> Kafka
> > > are costly and hence why we try as hard as we can to avoid them. When
> it
> > > comes to the brokers, they are often managed by a central group (or
> > they're
> > > in the Cloud), so they're a bit easier to manage. Even so, it's still
> > > possible to upgrade from 0.8.x directly to 2.7 since all protocol
> > versions
> > > are still supported. When it comes to the basic clients (producer,
> > > consumer, admin client), they're often embedded in applications so we
> > have
> > > to be even more conservative.
> > >
> > > Ismael
> > >
> > > On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman
> > > <sop...@confluent.io.invalid> wrote:
> > >
> > > > Ismael,
> > > >
> > > > It seems like given 3.0 is a breaking release, we have to rely on
> users
> > > > being aware of this and responsible
> > > > enough to read the upgrade guide. Otherwise we could never ever make
> > any
> > > > breaking changes beyond just
> > > > removing deprecated APIs or other compilation-breaking errors that
> > would
> > > be
> > > > immediately visible, no?
> > > >
> > > > That said, obviously it's better to have a circuit-breaker that will
> > fail
> > > > fast in case of a user misconfiguration
> > > > rather than silently corrupting the consumer group state -- eg for
> two
> > > > consumers to overlap in their ownership
> > > > of the same partition(s). We could definitely implement this, and now
> > > that
> > > > I think about it this might solve a
> > > > related problem in KAFKA-12477
> > > > <https://issues.apache.org/jira/browse/KAFKA-12477>. We just add a
> new
> > > > field to the Assignment in which the group leader
> > > > indicates whether it's on a recent enough version to understand
> > > cooperative
> > > > rebalancing. If an upgraded member
> > > > joins the group, it'll only be allowed to start following the new
> > > > rebalancing protocol after receiving the go-ahead
> > > > from the group leader.
> > > >
> > > > If we do go ahead and add this new field in the Assignment then I'm
> > > pretty
> > > > confident we can reduce the number
> > > > of required rolling bounces to just one with KAFKA-12477
> > > > <https://issues.apache.org/jira/browse/KAFKA-12477>. In that case we
> > > > should
> > > > be in much better shape to
> > > > feel good about changing the default to the
> CooperativeStickyAssignor.
> > > How
> > > > does that sound?
> > > >
> > > > To be clear, I'm not proposing we do this as part of KIP-726. Here's
> my
> > > > take:
> > > >
> > > > Let's pause this KIP while I work on making these two improvements in
> > > > KAFKA-12477 <https://issues.apache.org/jira/browse/KAFKA-12477>.
> Once
> > I
> > > > can
> > > > confirm the
> > > > short-circuit and single rolling bounce will be available for 3.0,
> I'll
> > > > report back on this thread. Then we can move
> > > > forward with this KIP again.
> > > >
> > > > Thoughts?
> > > > Sophie
> > > >
> > > > On Mon, Mar 29, 2021 at 12:01 AM Luke Chen <show...@gmail.com>
> wrote:
> > > >
> > > > > Hi Ismael,
> > > > > Thanks for your good question. Answer them below:
> > > > > *1. Are we saying that every consumer upgraded would have to follow
> > the
> > > > > complex path described in the KIP? *
> > > > > --> We suggest that every consumer did these 2 steps of rolling
> > > upgrade.
> > > > > And after KAFKA-12477 <
> > > https://issues.apache.org/jira/browse/KAFKA-12477
> > > > >
> > > > > is completed, it can be reduced to 1 rolling upgrade.
> > > > >
> > > > > *2. what happens if they don't read the instructions and upgrade as
> > > they
> > > > > have in the past?*
> > > > > --> The reason we want 2 steps of rolling upgrade is that we want
> to
> > > > avoid
> > > > > the situation where leader is on old byte-code and only recognize
> > > > "eager",
> > > > > but due to compatibility would still be able to deserialize the new
> > > > > protocol data from newer versioned members, and hence just go ahead
> > and
> > > > do
> > > > > the assignment while new versioned members did not revoke their
> > > > partitions
> > > > > before joining the group.
> > > > >
> > > > > But I'd say, the new default assignor "CooperativeStickyAssignor"
> was
> > > > > already introduced in V2.4.0, and it should be long enough for user
> > to
> > > > > upgrade to the new byte-code to recognize the "cooperative"
> protocol.
> > > > >
> > > > > What do you think?
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Mon, Mar 29, 2021 at 12:14 PM Ismael Juma <ism...@juma.me.uk>
> > > wrote:
> > > > >
> > > > > > Thanks for the KIP. Are we saying that every consumer upgraded
> > would
> > > > have
> > > > > > to follow the complex path described in the KIP? Also, what
> happens
> > > if
> > > > > they
> > > > > > don't read the instructions and upgrade as they have in the past?
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Fri, Mar 26, 2021, 1:53 AM Luke Chen <show...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > > <Update the subject>
> > > > > > >
> > > > > > > I'd like to discuss the following proposal to make the
> > > > > > > CooperativeStickyAssignor as the default assignor.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
> > > > > > >
> > > > > > > Any comments are welcomed.
> > > > > > >
> > > > > > > Thank you.
> > > > > > > Luke
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> -- Guozhang
>

Reply via email to