[ https://issues.apache.org/jira/browse/KAFKA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stanislav Kozlovski updated KAFKA-15823: ---------------------------------------- Fix Version/s: 3.8.0 (was: 3.7.0) > NodeToControllerChannelManager: authentication error prevents controller > update > ------------------------------------------------------------------------------- > > Key: KAFKA-15823 > URL: https://issues.apache.org/jira/browse/KAFKA-15823 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 3.6.0, 3.5.1 > Reporter: Gaurav Narula > Priority: Major > Fix For: 3.8.0 > > > NodeToControllerChannelManager caches the activeController address in an > AtomicReference which is updated when: > # activeController [has not been > set|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L422] > # networkClient [disconnnects from the > controller|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L395C7-L395C7] > # A node replies with > `[Errors.NOT_CONTROLLER|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L408]`, > and > # When a controller changes from [Zk mode to Kraft > mode|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L325] > > When running multiple Kafka clusters in a dynamic environment, there is a > chance that a controller's IP may get reassigned to another cluster's broker > when the controller is bounced. In this scenario, the requests from Node to > the Controller may fail with an AuthenticationException and are then retried > indefinitely. This causes the node to get stuck as the new controller's > information is never set. > > A potential fix would be disconnect the network client and invoke > `updateControllerAddress(null)` as we do in the `Errors.NOT_CONTROLLER` case. -- This message was sent by Atlassian Jira (v8.20.10#820010)