[ 
https://issues.apache.org/jira/browse/KAFKA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski updated KAFKA-15823:
----------------------------------------
    Fix Version/s: 3.8.0
                       (was: 3.7.0)

> NodeToControllerChannelManager: authentication error prevents controller 
> update
> -------------------------------------------------------------------------------
>
>                 Key: KAFKA-15823
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15823
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 3.6.0, 3.5.1
>            Reporter: Gaurav Narula
>            Priority: Major
>             Fix For: 3.8.0
>
>
> NodeToControllerChannelManager caches the activeController address in an 
> AtomicReference which is updated when:
>  # activeController [has not been 
> set|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L422]
>  # networkClient [disconnnects from the 
> controller|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L395C7-L395C7]
>  # A node replies with 
> `[Errors.NOT_CONTROLLER|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L408]`,
>  and
>  # When a controller changes from [Zk mode to Kraft 
> mode|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L325]
>  
> When running multiple Kafka clusters in a dynamic environment, there is a 
> chance that a controller's IP may get reassigned to another cluster's broker 
> when the controller is bounced. In this scenario, the requests from Node to 
> the Controller may fail with an AuthenticationException and are then retried 
> indefinitely. This causes the node to get stuck as the new controller's 
> information is never set.
>  
> A potential fix would be disconnect the network client and invoke 
> `updateControllerAddress(null)` as we do in the `Errors.NOT_CONTROLLER` case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to