[ 
https://issues.apache.org/jira/browse/KAFKA-12477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17306465#comment-17306465
 ] 

Guozhang Wang commented on KAFKA-12477:
---------------------------------------

Thanks! After reviewing the PR I think I get it that we are not auto-removing 
the assignor in the protocol configs (that's what I'm a bit worried), but just 
change the internal selected protocol behavior. That's reasonable and the extra 
safety guard makes sense. Let's continue more discussions on the PR then.

> Smart rebalancing with dynamic protocol selection
> -------------------------------------------------
>
>                 Key: KAFKA-12477
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12477
>             Project: Kafka
>          Issue Type: Improvement
>          Components: consumer
>            Reporter: A. Sophie Blee-Goldman
>            Assignee: A. Sophie Blee-Goldman
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Users who want to upgrade their applications and enable the COOPERATIVE 
> rebalancing protocol in their consumer apps are required to follow a double 
> rolling bounce upgrade path. The reason for this is laid out in the [Consumer 
> Upgrades|https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP429:KafkaConsumerIncrementalRebalanceProtocol-Consumer]
>  section of KIP-429. Basically, the ConsumerCoordinator picks a rebalancing 
> protocol in its constructor based on the list of supported partition 
> assignors. The protocol is selected as the highest protocol that is commonly 
> supported by all assignors in the list, and never changes after that.
> This is a bit unfortunate because it may end up using an older protocol even 
> after every member in the group has been updated to support the newer 
> protocol. After the first rolling bounce of the upgrade, all members will 
> have two assignors: "cooperative-sticky" and "range" (or 
> sticky/round-robin/etc). At this point the EAGER protocol will still be 
> selected due to the presence of the "range" assignor, but it's the 
> "cooperative-sticky" assignor that will ultimately be selected for use in 
> rebalances if that assignor is preferred (ie positioned first in the list). 
> The only reason for the second rolling bounce is to strip off the "range" 
> assignor and allow the upgraded members to switch over to COOPERATIVE. We 
> can't allow them to use cooperative rebalancing until everyone has been 
> upgraded, but once they have it's safe to do so.
> And there is already a way for the client to detect that everyone is on the 
> new byte code: if the CooperativeStickyAssignor is selected by the group 
> coordinator, then that means it is supported by all consumers in the group 
> and therefore everyone must be upgraded. 
> We may be able to save the second rolling bounce by dynamically updating the 
> rebalancing protocol inside the ConsumerCoordinator as "the highest protocol 
> supported by the assignor chosen by the group coordinator". This means we'll 
> still be using EAGER at the first rebalance, since we of course need to wait 
> for this initial rebalance to get the response from the group coordinator. 
> But we should take the hint from the chosen assignor rather than dropping 
> this information on the floor and sticking with the original protocol



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to