Ok I'm still fleshing out all the details of KAFKA-12477 but I think we can simplify some things a bit, and avoid any kind of "fail-fast" which will require user intervention. In fact I think we can avoid requiring the user to make any changes at all for KIP-726, so we don't have to worry about whether they actually read our documentation:
Instead of making ["cooperative-sticky"] the default, we change the default to ["cooperative-sticky", "range"]. Since "range" is the old default, this is equivalent to the first rolling bounce of the safe upgrade path in KIP-429. Of course this also means that under the current protocol selection mechanism we won't actually upgrade to cooperative rebalancing with the default assignor. But that's where KAFKA-12477 will come in. @Guozhang Wang <[email protected]> I'll get back to you with a concrete proposal and answer your questions, I just want to point out that it's possible to side-step the risk of users shooting themselves in the foot (well, at least in this one specific case, obviously they always find a way) On Tue, Mar 30, 2021 at 10:37 AM Guozhang Wang <[email protected]> wrote: > Hi Sophie, > > My question is more related to KAFKA-12477, but since your latest replies > are on this thread I figured we can follow-up on the same venue. Just so I > understand your latest comments above about the approach: > > * I think, we would need to persist this decision so that the group would > never go back to the eager protocol, this bit would be written to the > internal topic's assignment message. Is that correct? > * Maybe you can describe the steps, after the group has decided to move > forward with cooperative protocols, when: > 1) a new member joined the group with the old version, and hence only > recognized eager protocol and executing the eager protocol with its first > rebalance, what would happen. > 2) in addition to 1), the new member joined the group with the old version > and only recognized the old subscription format, and was selected as the > leader, what would happen. > > Guozhang > > > > > On Mon, Mar 29, 2021 at 10:30 PM Luke Chen <[email protected]> wrote: > > > Hi Sophie & Ismael, > > Thank you for your feedback. > > No problem, let's pause this KIP and wait for this improvement: > KAFKA-12477 > > <https://issues.apache.org/jira/browse/KAFKA-12477>. > > > > Stay tuned :) > > > > Thank you. > > Luke > > > > On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma <[email protected]> wrote: > > > > > Hi Sophie, > > > > > > I didn't analyze the KIP in detail, but the two suggestions you > mentioned > > > sound like great improvements. > > > > > > A bit more context: breaking changes for a widely used product like > Kafka > > > are costly and hence why we try as hard as we can to avoid them. When > it > > > comes to the brokers, they are often managed by a central group (or > > they're > > > in the Cloud), so they're a bit easier to manage. Even so, it's still > > > possible to upgrade from 0.8.x directly to 2.7 since all protocol > > versions > > > are still supported. When it comes to the basic clients (producer, > > > consumer, admin client), they're often embedded in applications so we > > have > > > to be even more conservative. > > > > > > Ismael > > > > > > On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman > > > <[email protected]> wrote: > > > > > > > Ismael, > > > > > > > > It seems like given 3.0 is a breaking release, we have to rely on > users > > > > being aware of this and responsible > > > > enough to read the upgrade guide. Otherwise we could never ever make > > any > > > > breaking changes beyond just > > > > removing deprecated APIs or other compilation-breaking errors that > > would > > > be > > > > immediately visible, no? > > > > > > > > That said, obviously it's better to have a circuit-breaker that will > > fail > > > > fast in case of a user misconfiguration > > > > rather than silently corrupting the consumer group state -- eg for > two > > > > consumers to overlap in their ownership > > > > of the same partition(s). We could definitely implement this, and now > > > that > > > > I think about it this might solve a > > > > related problem in KAFKA-12477 > > > > <https://issues.apache.org/jira/browse/KAFKA-12477>. We just add a > new > > > > field to the Assignment in which the group leader > > > > indicates whether it's on a recent enough version to understand > > > cooperative > > > > rebalancing. If an upgraded member > > > > joins the group, it'll only be allowed to start following the new > > > > rebalancing protocol after receiving the go-ahead > > > > from the group leader. > > > > > > > > If we do go ahead and add this new field in the Assignment then I'm > > > pretty > > > > confident we can reduce the number > > > > of required rolling bounces to just one with KAFKA-12477 > > > > <https://issues.apache.org/jira/browse/KAFKA-12477>. In that case we > > > > should > > > > be in much better shape to > > > > feel good about changing the default to the > CooperativeStickyAssignor. > > > How > > > > does that sound? > > > > > > > > To be clear, I'm not proposing we do this as part of KIP-726. Here's > my > > > > take: > > > > > > > > Let's pause this KIP while I work on making these two improvements in > > > > KAFKA-12477 <https://issues.apache.org/jira/browse/KAFKA-12477>. > Once > > I > > > > can > > > > confirm the > > > > short-circuit and single rolling bounce will be available for 3.0, > I'll > > > > report back on this thread. Then we can move > > > > forward with this KIP again. > > > > > > > > Thoughts? > > > > Sophie > > > > > > > > On Mon, Mar 29, 2021 at 12:01 AM Luke Chen <[email protected]> > wrote: > > > > > > > > > Hi Ismael, > > > > > Thanks for your good question. Answer them below: > > > > > *1. Are we saying that every consumer upgraded would have to follow > > the > > > > > complex path described in the KIP? * > > > > > --> We suggest that every consumer did these 2 steps of rolling > > > upgrade. > > > > > And after KAFKA-12477 < > > > https://issues.apache.org/jira/browse/KAFKA-12477 > > > > > > > > > > is completed, it can be reduced to 1 rolling upgrade. > > > > > > > > > > *2. what happens if they don't read the instructions and upgrade as > > > they > > > > > have in the past?* > > > > > --> The reason we want 2 steps of rolling upgrade is that we want > to > > > > avoid > > > > > the situation where leader is on old byte-code and only recognize > > > > "eager", > > > > > but due to compatibility would still be able to deserialize the new > > > > > protocol data from newer versioned members, and hence just go ahead > > and > > > > do > > > > > the assignment while new versioned members did not revoke their > > > > partitions > > > > > before joining the group. > > > > > > > > > > But I'd say, the new default assignor "CooperativeStickyAssignor" > was > > > > > already introduced in V2.4.0, and it should be long enough for user > > to > > > > > upgrade to the new byte-code to recognize the "cooperative" > protocol. > > > > > > > > > > What do you think? > > > > > > > > > > Thank you. > > > > > Luke > > > > > > > > > > On Mon, Mar 29, 2021 at 12:14 PM Ismael Juma <[email protected]> > > > wrote: > > > > > > > > > > > Thanks for the KIP. Are we saying that every consumer upgraded > > would > > > > have > > > > > > to follow the complex path described in the KIP? Also, what > happens > > > if > > > > > they > > > > > > don't read the instructions and upgrade as they have in the past? > > > > > > > > > > > > Ismael > > > > > > > > > > > > On Fri, Mar 26, 2021, 1:53 AM Luke Chen <[email protected]> > wrote: > > > > > > > > > > > > > Hi everyone, > > > > > > > <Update the subject> > > > > > > > > > > > > > > I'd like to discuss the following proposal to make the > > > > > > > CooperativeStickyAssignor as the default assignor. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor > > > > > > > > > > > > > > Any comments are welcomed. > > > > > > > > > > > > > > Thank you. > > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > -- Guozhang >
