[ https://issues.apache.org/jira/browse/KAFKA-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763086#comment-17763086 ]
David Arthur commented on KAFKA-15411: -------------------------------------- The error {code:java} [2023-09-08 11:11:28,426] ERROR [StandardAuthorizer 1000] Failed to complete initial ACL load process. (org.apache.kafka.metadata.authorizer.StandardAuthorizerData:96) java.util.concurrent.TimeoutException at kafka.server.metadata.AclPublisher.close(AclPublisher.scala:98) at org.apache.kafka.image.loader.MetadataLoader.closePublisher(MetadataLoader.java:568) at org.apache.kafka.image.loader.MetadataLoader.lambda$removeAndClosePublisher$7(MetadataLoader.java:528) at org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210) at org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181) at java.lang.Thread.run(Thread.java:750) {code} just means that the authorizer never received an initial MetadataImage from AclPublisher. I think [~pprovenzano] is on the right track with the keystore error. Taking one example from a failed test {code:java} Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL keystore /tmp/truststore5892229881277678824.jks of type JKS at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.load(DefaultSslEngineFactory.java:374) at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.<init>(DefaultSslEngineFactory.java:346) at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory.createTruststore(DefaultSslEngineFactory.java:319) at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory.configure(DefaultSslEngineFactory.java:168) at org.apache.kafka.common.security.ssl.SslFactory.instantiateSslEngineFactory(SslFactory.java:140) at org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:97) at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:180) ... 25 more Caused by: java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:392) at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:661) at sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:57) at sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224) at sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:71) at java.security.KeyStore.load(KeyStore.java:1445) at org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.load(DefaultSslEngineFactory.java:371) ... 31 more {code} It looks like the keystore file exists, but maybe doesn't have data in it? Perhaps there is some test setup race condition that we haven't hit before, but with the addition of so many new tests that use this code we are now seeing it? Unfortunately, I don't know enough about the authz tests to really suggest where to look. I would suggest creating a branch that modifies the Jenkinsfile to run a single test class N times without parallelism and increase the relevant log4j loggers to DEBUG. > DelegationTokenEndToEndAuthorizationWithOwnerTest is Flaky > ----------------------------------------------------------- > > Key: KAFKA-15411 > URL: https://issues.apache.org/jira/browse/KAFKA-15411 > Project: Kafka > Issue Type: Bug > Components: kraft > Reporter: Proven Provenzano > Assignee: Proven Provenzano > Priority: Major > Labels: flaky-test > Fix For: 3.7.0 > > > DelegationTokenEndToEndAuthorizationWithOwnerTest has become flaky since the > merge of delegation token support for KRaft (PR - > https://github.com/apache/kafka/pull/14083). -- This message was sent by Atlassian Jira (v8.20.10#820010)