[ 
https://issues.apache.org/jira/browse/KAFKA-7987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967790#comment-16967790
 ] 

Graham Campbell commented on KAFKA-7987:
----------------------------------------

[~junrao] We're starting to see these errors more frequently (guess our network 
is getting less reliable), so I'm looking at a fix for this. Does scheduling a 
reinitialize() in the ZookeeperClient after notifying handlers seem like a 
reasonable solution? I don't see a way to get more details from the ZK client 
to try to tell if the auth failure was caused by a retriable error or not.

> a broker's ZK session may die on transient auth failure
> -------------------------------------------------------
>
>                 Key: KAFKA-7987
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7987
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jun Rao
>            Priority: Major
>
> After a transient network issue, we saw the following log in a broker.
> {code:java}
> [23:37:02,102] ERROR SASL authentication with Zookeeper Quorum member failed: 
> javax.security.sasl.SaslException: An error: 
> (java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
> GSS initiate failed [Caused by GSSException: No valid credentials provided 
> (Mechanism level: Server not found in Kerberos database (7))]) occurred when 
> evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client 
> will go to AUTH_FAILED state. (org.apache.zookeeper.ClientCnxn)
> [23:37:02,102] ERROR [ZooKeeperClient] Auth failed. 
> (kafka.zookeeper.ZooKeeperClient)
> {code}
> The network issue prevented the broker from communicating to ZK. The broker's 
> ZK session then expired, but the broker didn't know that yet since it 
> couldn't establish a connection to ZK. When the network was back, the broker 
> tried to establish a connection to ZK, but failed due to auth failure (likely 
> due to a transient KDC issue). The current logic just ignores the auth 
> failure without trying to create a new ZK session. Then the broker will be 
> permanently in a state that it's alive, but not registered in ZK.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to