Jason Gustafson created KAFKA-15828:
---------------------------------------
Summary: Protect clients from broker hostname reuse
Key: KAFKA-15828
URL: https://issues.apache.org/jira/browse/KAFKA-15828
Project: Kafka
Issue Type: Bug
Reporter: Jason Gustafson
In some environments such as k8s, brokers may be assigned to nodes dynamically
from an available pool. When a cluster is rolling, it is possible for the
client to see the same node advertised for different broker IDs in a short
period of time. For example, kafka-1 might be initially assigned to node1.
Before the client is able to establish a connection, it could be that kafka-3
is now on node1 instead. Currently there is no protection in the client or in
the protocol for this scenario. If the connection succeeds, the client will
assume it has a good connection to kafka-1. Until something disrupts the
connection, it will continue under this assumption even if the hostname for
kafka-1 changes.
We have observed this scenario in practice. The client connected to the wrong
broker through stale hostname information. It was unable to produce data
because of persistent NOT_LEADER errors. The only way to recover in the end was
by restarting the client to force a reconnection.
We have discussed a couple potential solutions to this problem:
# Let the client be smarter managing the connection/hostname mapping. When it
detects that a hostname has changed, it should force a disconnect to ensure it
connects to the right node.
# We can modify the protocol to verify that the client has connected to the
intended broker. For example, we can add a field to ApiVersions to indicate the
intended broker ID. The broker receiving the request can return an error if its
ID does not match that in the request.
Are there alternatives?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)