Gaurav Narula created KAFKA-15823: ------------------------------------- Summary: NodeToControllerChannelManager: authentication error prevents controller update Key: KAFKA-15823 URL: https://issues.apache.org/jira/browse/KAFKA-15823 Project: Kafka Issue Type: Bug Components: core Affects Versions: 3.5.1, 3.6.0 Reporter: Gaurav Narula
NodeToControllerChannelManager caches the activeController address in an AtomicReference which is updated when: # activeController [has not been set|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L422] # networkClient [disconnnects from the controller|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L395C7-L395C7] # A node replies with `[Errors.NOT_CONTROLLER|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L408]`, and # When a controller changes from [Zk mode to Kraft mode|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L325] When running multiple Kafka clusters in a dynamic environment, there is a chance that a controller's IP may get reassigned to another cluster's broker when the controller is bounced. In this scenario, the requests from Node to the Controller may fail with an AuthenticationException and are then retried indefinitely. This causes the node to get stuck as the new controller's information is never set. A potential fix would be disconnect the network client and invoke `updateControllerAddress(null)` as we do in the `Errors.NOT_CONTROLLER` case. -- This message was sent by Atlassian Jira (v8.20.10#820010)