Neha Narkhede created ZOOKEEPER-1457: ----------------------------------------
Summary: Ephemeral node deleted for unexpired sessions Key: ZOOKEEPER-1457 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1457 Project: ZooKeeper Issue Type: Bug Affects Versions: 3.3.4 Reporter: Neha Narkhede This week, we saw a potential bug with zookeeper 3.3.4. In an attempt to adding a separate disk for zookeeper transaction logs, our SysOps team threw new disks at all the zookeeper servers in our production cluster at around the same time. Right after this, we saw degraded performance on our zookeeper cluster. And yes, I agree that this degraded behavior is expected and we could've done a better job and upgraded one server at a time. Al though, the observed impact was that ephemeral nodes got deleted without session expiration on the zookeeper clients. Let me try and describe what I've observed from the Kafka and ZK server logs - Kafka client has a session established with ZK, say Session A, that it has been using successfully. At the time of the degraded ZK performance issue, Session A expires. Kafka's ZkClient tries to establish another session with ZK. After 9 seconds, it establishes a session, say Session B and tries to use it for creating a znode. This operation fails with a NodeExists error since another session, say session C, has created that znode. This is considered OK since ZkClient retries an operation transparently if it gets disconnected and sometimes you can get NodeExists. But then later, session C expires and hence the ephemeral node is deleted from ZK. This leads to unexpected errors in Kafka since its session, Session B, is still valid and hence it expects the znode to be there. The issue is that session C was established, created the znode and expired, without the zookeeper client on Kafka ever knowing about it. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira