Jun Rao created KAFKA-7987:
------------------------------

             Summary: a broker's ZK session may die on transient auth failure
                 Key: KAFKA-7987
                 URL: https://issues.apache.org/jira/browse/KAFKA-7987
             Project: Kafka
          Issue Type: Improvement
            Reporter: Jun Rao


After a transient network issue, we saw the following log in a broker.
{code:java}
[23:37:02,102] ERROR SASL authentication with Zookeeper Quorum member failed: 
javax.security.sasl.SaslException: An error: 
(java.security.PrivilegedActionException: javax.security.sasl.SaslException: 
GSS initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Server not found in Kerberos database (7))]) occurred when 
evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will 
go to AUTH_FAILED state. (org.apache.zookeeper.ClientCnxn)
[23:37:02,102] ERROR [ZooKeeperClient] Auth failed. 
(kafka.zookeeper.ZooKeeperClient)
{code}
The network issue prevented the broker from communicating to ZK. The broker's 
ZK session then expired, but the broker didn't know that yet since it couldn't 
establish a connection to ZK. When the network was back, the broker tried to 
establish a connection to ZK, but failed due to auth failure (likely due to a 
transient KDC issue). The current logic just ignores the auth failure without 
trying to create a new ZK session. Then the broker will be permanently in a 
state that it's alive, but not registered in ZK.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to