[
https://issues.apache.org/jira/browse/KAFKA-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763086#comment-17763086
]
David Arthur commented on KAFKA-15411:
--------------------------------------
The error
{code:java}
[2023-09-08 11:11:28,426] ERROR [StandardAuthorizer 1000] Failed to complete
initial ACL load process.
(org.apache.kafka.metadata.authorizer.StandardAuthorizerData:96)
java.util.concurrent.TimeoutException
at kafka.server.metadata.AclPublisher.close(AclPublisher.scala:98)
at
org.apache.kafka.image.loader.MetadataLoader.closePublisher(MetadataLoader.java:568)
at
org.apache.kafka.image.loader.MetadataLoader.lambda$removeAndClosePublisher$7(MetadataLoader.java:528)
at
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
at
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
at java.lang.Thread.run(Thread.java:750)
{code}
just means that the authorizer never received an initial MetadataImage from
AclPublisher. I think [~pprovenzano] is on the right track with the keystore
error.
Taking one example from a failed test
{code:java}
Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL keystore
/tmp/truststore5892229881277678824.jks of type JKS
at
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.load(DefaultSslEngineFactory.java:374)
at
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.<init>(DefaultSslEngineFactory.java:346)
at
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory.createTruststore(DefaultSslEngineFactory.java:319)
at
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory.configure(DefaultSslEngineFactory.java:168)
at
org.apache.kafka.common.security.ssl.SslFactory.instantiateSslEngineFactory(SslFactory.java:140)
at
org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:97)
at
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:180)
... 25 more
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:661)
at
sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:57)
at
sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224)
at
sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:71)
at java.security.KeyStore.load(KeyStore.java:1445)
at
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.load(DefaultSslEngineFactory.java:371)
... 31 more
{code}
It looks like the keystore file exists, but maybe doesn't have data in it?
Perhaps there is some test setup race condition that we haven't hit before, but
with the addition of so many new tests that use this code we are now seeing it?
Unfortunately, I don't know enough about the authz tests to really suggest
where to look.
I would suggest creating a branch that modifies the Jenkinsfile to run a single
test class N times without parallelism and increase the relevant log4j loggers
to DEBUG.
> DelegationTokenEndToEndAuthorizationWithOwnerTest is Flaky
> -----------------------------------------------------------
>
> Key: KAFKA-15411
> URL: https://issues.apache.org/jira/browse/KAFKA-15411
> Project: Kafka
> Issue Type: Bug
> Components: kraft
> Reporter: Proven Provenzano
> Assignee: Proven Provenzano
> Priority: Major
> Labels: flaky-test
> Fix For: 3.7.0
>
>
> DelegationTokenEndToEndAuthorizationWithOwnerTest has become flaky since the
> merge of delegation token support for KRaft (PR -
> https://github.com/apache/kafka/pull/14083).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)