Gaurav Narula created KAFKA-15823:
-------------------------------------
Summary: NodeToControllerChannelManager: authentication error
prevents controller update
Key: KAFKA-15823
URL: https://issues.apache.org/jira/browse/KAFKA-15823
Project: Kafka
Issue Type: Bug
Components: core
Affects Versions: 3.5.1, 3.6.0
Reporter: Gaurav Narula
NodeToControllerChannelManager caches the activeController address in an
AtomicReference which is updated when:
# activeController [has not been
set|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L422]
# networkClient [disconnnects from the
controller|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L395C7-L395C7]
# A node replies with
`[Errors.NOT_CONTROLLER|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L408]`,
and
# When a controller changes from [Zk mode to Kraft
mode|https://github.com/apache/kafka/blob/832627fc78484fdc7c8d6da8a2d20e7691dbf882/core/src/main/scala/kafka/server/NodeToControllerChannelManager.scala#L325]
When running multiple Kafka clusters in a dynamic environment, there is a
chance that a controller's IP may get reassigned to another cluster's broker
when the controller is bounced. In this scenario, the requests from Node to the
Controller may fail with an AuthenticationException and are then retried
indefinitely. This causes the node to get stuck as the new controller's
information is never set.
A potential fix would be disconnect the network client and invoke
`updateControllerAddress(null)` as we do in the `Errors.NOT_CONTROLLER` case.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)