mumrah commented on code in PR #15744: URL: https://github.com/apache/kafka/pull/15744#discussion_r1583263129
########## core/src/test/scala/integration/kafka/zk/ZkMigrationIntegrationTest.scala: ########## @@ -950,16 +980,47 @@ class ZkMigrationIntegrationTest { dataOpt.map(ProducerIdBlockZNode.parseProducerIdBlockData).get } - def alterTopicConfig(admin: Admin): AlterConfigsResult = { + def alterBrokerConfigs(admin: Admin): Unit = { + val defaultBrokerResource = new ConfigResource(ConfigResource.Type.BROKER, "") + val defaultBrokerConfigs = Seq( + new AlterConfigOp(new ConfigEntry(KafkaConfig.LogRetentionTimeMillisProp, "86400000"), AlterConfigOp.OpType.SET), + ).asJavaCollection + val broker0Resource = new ConfigResource(ConfigResource.Type.BROKER, "0") + val broker1Resource = new ConfigResource(ConfigResource.Type.BROKER, "1") + val specificBrokerConfigs = Seq( + new AlterConfigOp(new ConfigEntry(KafkaConfig.LogRetentionTimeMillisProp, "43200000"), AlterConfigOp.OpType.SET), + ).asJavaCollection + + TestUtils.retry(60000) { + val result = admin.incrementalAlterConfigs(Map( + defaultBrokerResource -> defaultBrokerConfigs, + broker0Resource -> specificBrokerConfigs, + broker1Resource -> specificBrokerConfigs + ).asJava) + try { + result.all().get(10, TimeUnit.SECONDS) + } catch { + case t: Throwable => fail("Alter Broker Configs had an error", t) + } + } Review Comment: The reason for the retries in both tests is to deal with a few race conditions without making the test code overly complex. `testDualWrite` blocks until the `/controller` ZNode is updated to the KRaft controller. However, the ZK brokers won't learn about this new controller until the migration finishes and they receive a UMR. So that's the first race. The second race is that brokers receive UMR independently, so we would have to wait for all the brokers to be in the same state regarding the controller. We'd have to get down to the MetadataCache to see this. Since the MetadataResponse never exposes any information about KRaft or the migration, so we can't use that as a way to poll the brokers for a ready state. So basically, retrying the alter config requests until they succeed (up to a limit) will smooth over these timing issues. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org