Luke Chen created KAFKA-16563:
---------------------------------
Summary: migration to KRaft hanging after KeeperException
Key: KAFKA-16563
URL: https://issues.apache.org/jira/browse/KAFKA-16563
Project: Kafka
Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen
Assignee: Luke Chen
When running ZK migrating to KRaft process, we encountered an issue that the
migrating is hanging and the `ZkMigrationState` cannot move to `MIGRATION`
state. After investigation, the root cause is because the pollEvent didn't
retry with the retriable KeeperException while it should.
{code:java}
2024-04-11 21:27:55,393 INFO [KRaftMigrationDriver id=5] Encountered ZooKeeper
error during event PollEvent. Will retry.
(org.apache.kafka.metadata.migration.KRaftMigrationDriver)
[controller-5-migration-driver-event-handler]org.apache.zookeeper.KeeperException$NodeExistsException:
KeeperErrorCode = NodeExists for /migration at
org.apache.zookeeper.KeeperException.create(KeeperException.java:126) at
org.apache.zookeeper.KeeperException.create(KeeperException.java:54) at
kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:570) at
kafka.zk.KafkaZkClient.createInitialMigrationState(KafkaZkClient.scala:1701)
at kafka.zk.KafkaZkClient.getOrCreateMigrationState(KafkaZkClient.scala:1689)
at
kafka.zk.ZkMigrationClient.$anonfun$getOrCreateMigrationRecoveryState$1(ZkMigrationClient.scala:109)
at
kafka.zk.ZkMigrationClient.getOrCreateMigrationRecoveryState(ZkMigrationClient.scala:69)
at
org.apache.kafka.metadata.migration.KRaftMigrationDriver.applyMigrationOperation(KRaftMigrationDriver.java:248)
at
org.apache.kafka.metadata.migration.KRaftMigrationDriver.recoverMigrationStateFromZK(KRaftMigrationDriver.java:169)
at
org.apache.kafka.metadata.migration.KRaftMigrationDriver.access$1900(KRaftMigrationDriver.java:62)
at
org.apache.kafka.metadata.migration.KRaftMigrationDriver$PollEvent.run(KRaftMigrationDriver.java:794)
at
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
at java.base/java.lang.Thread.run(Thread.java:840){code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)