[ https://issues.apache.org/jira/browse/KAFKA-15330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823067#comment-17823067 ]
Roland Sommer commented on KAFKA-15330: --------------------------------------- After the release of kafka 3.7 I reran the migration tests in our staging environment and I am still seeing the same problems. The quorum is assembled: {code:java} [KRaftMigrationDriver id=87] Controller Quorum is ready for Zk to KRaft migration. Now waiting for ZK brokers. (org.apache.kafka.metadata.migration.KRaftMigrationDriver) [KRaftMigrationDriver id=87] 87 transitioning from WAIT_FOR_CONTROLLER_QUORUM to WAIT_FOR_BROKERS state (org.apache.kafka.metadata.migration.KRaftMigrationDriver) {code} but still on the controller side: {code:java} [KRaftMigrationDriver id=87] No brokers are known to KRaft, waiting for brokers to register. (org.apache.kafka.metadata.migration.KRaftMigrationDriver) {code} and on the broker side: {code:java} [BrokerLifecycleManager id=1 isZkBroker=true] Unable to register the broker because the RPC got timed out before it could be sent. (kafka.server.BrokerLifecycleManager) {code} > Migration from ZK to KRaft works with 3.4 but fails from 3.5 upwards > -------------------------------------------------------------------- > > Key: KAFKA-15330 > URL: https://issues.apache.org/jira/browse/KAFKA-15330 > Project: Kafka > Issue Type: Bug > Affects Versions: 3.5.0, 3.5.1 > Environment: Debian Bookworm/12.1 > kafka 3.4 and 3.5 / scala 2.13 > OpenJDK Runtime Environment (build 17.0.8+7-Debian-1deb12u1) > Reporter: Roland Sommer > Priority: Major > Attachments: broker.properties, controller.properties > > > We recently did some migration testing from our old ZK-based kafka clusters > to KRaft while still being on kafka 3.4. The migration tests succeeded at > first try. In the meantime we updated to kafka 3.5/3.5.1 and now we wanted to > continue our migration work, which ran into unexpected problems. > On the controller we get messages like: > {code:java} > Aug 10 06:49:33 kafkactl01 kafka-server-start.sh[48572]: [2023-08-10 > 06:49:33,072] INFO [KRaftMigrationDriver id=495] Still waiting for all > controller nodes ready to begin the migration. due to: Missing apiVersion > from nodes: [514, 760] > (org.apache.kafka.metadata.migration.KRaftMigrationDriver){code} > On the broker side, we see: > {code:java} > 06:52:56,109] INFO [BrokerLifecycleManager id=6 isZkBroker=true] Unable to > register the broker because the RPC got timed out before it could be sent. > (kafka.server.BrokerLifecycleManager){code} > If we reinstall the same development cluster with kafka 3.4, using the exact > same steps provided by your migration documentation (only difference is using > {{inter.broker.protocol.version=3.4}} instead of > {{{}inter.broker.protocol.version=3.5{}}}), everything works as expected. > Updating to kafka 3.5/3.5.1 yields the same problems. > Testing is done on a three-node kafka cluster with a three-node zookeeper > ensemble and a three-node controller setup. > Besides our default configuration containing the active zookeeper hosts etc., > this is what was added on the brokers: > {code:java} > # Migration > advertised.listeners=PLAINTEXT://kafka03:9092 > listener.security.protocol.map=PLAINTEXT:PLAINTEXT,CONTROLLER:PLAINTEXT > zookeeper.metadata.migration.enable=true > controller.quorum.voters=495@kafkactl01:9093,760@kafkactl02:9093,514@kafkactl03:9093 > controller.listener.names=CONTROLLER > {code} > The main controller config looks like this: > {code:java} > process.roles=controller > node.id=495 > controller.quorum.voters=495@kafkactl01:9093,760@kafkactl02:9093,514@kafkactl03:9093 > listeners=CONTROLLER://:9093 > inter.broker.listener.name=PLAINTEXT > controller.listener.names=CONTROLLER > listener.security.protocol.map=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT > zookeeper.metadata.migration.enable=true > {code} > Both configs contain the identical {{zookeeper.connect}} settings, everything > is setup automatically so it should be identical on every run and we can > reliably reproduce migration success on kafka 3.4 and migration failure using > the same setup with kafka 3.5. > There are other issues mentioning problems with ApiVersions like KAFKA-15230 > - not quite sure if this is a duplicate of the underlying problem there. -- This message was sent by Atlassian Jira (v8.20.10#820010)