[ https://issues.apache.org/jira/browse/KAFKA-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306465#comment-17306465 ]
Guozhang Wang commented on KAFKA-12477: --------------------------------------- Thanks! After reviewing the PR I think I get it that we are not auto-removing the assignor in the protocol configs (that's what I'm a bit worried), but just change the internal selected protocol behavior. That's reasonable and the extra safety guard makes sense. Let's continue more discussions on the PR then. > Smart rebalancing with dynamic protocol selection > ------------------------------------------------- > > Key: KAFKA-12477 > URL: https://issues.apache.org/jira/browse/KAFKA-12477 > Project: Kafka > Issue Type: Improvement > Components: consumer > Reporter: A. Sophie Blee-Goldman > Assignee: A. Sophie Blee-Goldman > Priority: Major > Fix For: 3.0.0 > > > Users who want to upgrade their applications and enable the COOPERATIVE > rebalancing protocol in their consumer apps are required to follow a double > rolling bounce upgrade path. The reason for this is laid out in the [Consumer > Upgrades|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP429:KafkaConsumerIncrementalRebalanceProtocol-Consumer] > section of KIP-429. Basically, the ConsumerCoordinator picks a rebalancing > protocol in its constructor based on the list of supported partition > assignors. The protocol is selected as the highest protocol that is commonly > supported by all assignors in the list, and never changes after that. > This is a bit unfortunate because it may end up using an older protocol even > after every member in the group has been updated to support the newer > protocol. After the first rolling bounce of the upgrade, all members will > have two assignors: "cooperative-sticky" and "range" (or > sticky/round-robin/etc). At this point the EAGER protocol will still be > selected due to the presence of the "range" assignor, but it's the > "cooperative-sticky" assignor that will ultimately be selected for use in > rebalances if that assignor is preferred (ie positioned first in the list). > The only reason for the second rolling bounce is to strip off the "range" > assignor and allow the upgraded members to switch over to COOPERATIVE. We > can't allow them to use cooperative rebalancing until everyone has been > upgraded, but once they have it's safe to do so. > And there is already a way for the client to detect that everyone is on the > new byte code: if the CooperativeStickyAssignor is selected by the group > coordinator, then that means it is supported by all consumers in the group > and therefore everyone must be upgraded. > We may be able to save the second rolling bounce by dynamically updating the > rebalancing protocol inside the ConsumerCoordinator as "the highest protocol > supported by the assignor chosen by the group coordinator". This means we'll > still be using EAGER at the first rebalance, since we of course need to wait > for this initial rebalance to get the response from the group coordinator. > But we should take the hint from the chosen assignor rather than dropping > this information on the floor and sticking with the original protocol -- This message was sent by Atlassian Jira (v8.3.4#803005)