Luke Chen created KAFKA-15702:
---------------------------------
Summary: apiVersion request doesn't send when DNS is not ready at
first
Key: KAFKA-15702
URL: https://issues.apache.org/jira/browse/KAFKA-15702
Project: Kafka
Issue Type: Bug
Affects Versions: 3.6.0
Reporter: Luke Chen
When ZK migrating to KRaft, we [rely on
apiVersions|https://github.com/apache/kafka/blob/3055cd7c180cac15016169c52383ddc204ca5f16/metadata/src/main/java/org/apache/kafka/controller/QuorumFeatures.java#L140]
to check if all controllers enabled ZkMigration flag. But if the DNS is not
updated with the latest info (frequently happen in k8s env), then the active
controller won't send out the apiVersion request.
The impact is the ZK migrating to KRaft will be blocked since the active
controller will consider the follower is not ready.
*The flow of the issue is like this:*
1. start up 3 controllers, ex: c1, c2, c3
2. The DNS doesn't update the host name entry of c3.
3. c1 becomes the leader, and send apiVersion request to c2
4. c1 is trying to connect to c3, and got unknownHost error
5. DNS is updated with c3 entry
6. c1 successfully connect to c3, but no apiVersion request sent.
7. The KRaftMigrationDriver keeps waiting for c3 ready for ZK migration
Had a look, and it looks like the channel cannot successfully finishConnect
[here|https://github.com/apache/kafka/blob/3.6/clients/src/main/java/org/apache/kafka/common/network/Selector.java#L527],
so the channel won't be considered as connected, and initiate a apiVersion
request.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)