[ 
https://issues.apache.org/jira/browse/KAFKA-15411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763086#comment-17763086
 ] 

David Arthur commented on KAFKA-15411:
--------------------------------------

The error
{code:java}
[2023-09-08 11:11:28,426] ERROR [StandardAuthorizer 1000] Failed to complete 
initial ACL load process. 
(org.apache.kafka.metadata.authorizer.StandardAuthorizerData:96) 
java.util.concurrent.TimeoutException   
        at kafka.server.metadata.AclPublisher.close(AclPublisher.scala:98)      
        at 
org.apache.kafka.image.loader.MetadataLoader.closePublisher(MetadataLoader.java:568)
 
        at 
org.apache.kafka.image.loader.MetadataLoader.lambda$removeAndClosePublisher$7(MetadataLoader.java:528)
       
        at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
    
        at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
   
        at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
    
        at java.lang.Thread.run(Thread.java:750)
{code}
just means that the authorizer never received an initial MetadataImage from 
AclPublisher. I think [~pprovenzano] is on the right track with the keystore 
error.

Taking one example from a failed test
{code:java}
Caused by: org.apache.kafka.common.KafkaException: Failed to load SSL keystore 
/tmp/truststore5892229881277678824.jks of type JKS       
        at 
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.load(DefaultSslEngineFactory.java:374)
   
        at 
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.<init>(DefaultSslEngineFactory.java:346)
 
        at 
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory.createTruststore(DefaultSslEngineFactory.java:319)
      
        at 
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory.configure(DefaultSslEngineFactory.java:168)
     
        at 
org.apache.kafka.common.security.ssl.SslFactory.instantiateSslEngineFactory(SslFactory.java:140)
     
        at 
org.apache.kafka.common.security.ssl.SslFactory.configure(SslFactory.java:97)   
     
        at 
org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:180)
    
        ... 25 more     
Caused by: java.io.EOFException 
        at java.io.DataInputStream.readInt(DataInputStream.java:392)    
        at sun.security.provider.JavaKeyStore.engineLoad(JavaKeyStore.java:661) 
        at 
sun.security.provider.JavaKeyStore$JKS.engineLoad(JavaKeyStore.java:57)      
        at 
sun.security.provider.KeyStoreDelegator.engineLoad(KeyStoreDelegator.java:224)  
     
        at 
sun.security.provider.JavaKeyStore$DualFormatJKS.engineLoad(JavaKeyStore.java:71)
    
        at java.security.KeyStore.load(KeyStore.java:1445)      
        at 
org.apache.kafka.common.security.ssl.DefaultSslEngineFactory$FileBasedStore.load(DefaultSslEngineFactory.java:371)
   
        ... 31 more
{code}
It looks like the keystore file exists, but maybe doesn't have data in it? 
Perhaps there is some test setup race condition that we haven't hit before, but 
with the addition of so many new tests that use this code we are now seeing it? 
Unfortunately, I don't know enough about the authz tests to really suggest 
where to look.

I would suggest creating a branch that modifies the Jenkinsfile to run a single 
test class N times without parallelism and increase the relevant log4j loggers 
to DEBUG.

> DelegationTokenEndToEndAuthorizationWithOwnerTest is Flaky 
> -----------------------------------------------------------
>
>                 Key: KAFKA-15411
>                 URL: https://issues.apache.org/jira/browse/KAFKA-15411
>             Project: Kafka
>          Issue Type: Bug
>          Components: kraft
>            Reporter: Proven Provenzano
>            Assignee: Proven Provenzano
>            Priority: Major
>              Labels: flaky-test
>             Fix For: 3.7.0
>
>
> DelegationTokenEndToEndAuthorizationWithOwnerTest has become flaky since the 
> merge of delegation token support for KRaft (PR - 
> https://github.com/apache/kafka/pull/14083).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to