Neha Narkhede created ZOOKEEPER-1457:
----------------------------------------

             Summary: Ephemeral node deleted for unexpired sessions
                 Key: ZOOKEEPER-1457
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1457
             Project: ZooKeeper
          Issue Type: Bug
    Affects Versions: 3.3.4
            Reporter: Neha Narkhede


This week, we saw a potential bug with zookeeper 3.3.4. In an attempt to adding 
a separate disk for zookeeper transaction logs, our SysOps team threw new disks 
at all the zookeeper servers in our production cluster at around the same time. 
Right after this, we saw degraded performance on our zookeeper cluster. And 
yes, I agree that this degraded behavior is expected and we could've done a 
better job and upgraded one server at a time. Al though, the observed impact 
was that ephemeral nodes got deleted without session expiration on the 
zookeeper clients. 

Let me try and describe what I've observed from the Kafka and ZK server logs - 
Kafka client has a session established with ZK, say Session A, that it has been 
using successfully. At the time of the degraded ZK performance issue, Session A 
expires. Kafka's ZkClient tries to establish another session with ZK. After 9 
seconds, it establishes a session, say Session B and tries to use it for 
creating a znode. This operation fails with a NodeExists error since another 
session, say session C, has created that znode. This is considered OK since 
ZkClient retries an operation transparently if it gets disconnected and 
sometimes you can get NodeExists. But then later, session C expires and hence 
the ephemeral node is deleted from ZK. This leads to unexpected errors in Kafka 
since its session, Session B, is still valid and hence it expects the znode to 
be there. The issue is that session C was established, created the znode and 
expired, without the zookeeper client on Kafka ever knowing about it. 



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to