Guozhang Wang created KAFKA-992:
-----------------------------------
Summary: Double Check on Broker Registration to Avoid False
NodeExist Exception
Key: KAFKA-992
URL: https://issues.apache.org/jira/browse/KAFKA-992
Project: Kafka
Issue Type: Bug
Reporter: Guozhang Wang
Assignee: Guozhang Wang
There is a potential bug in Zookeeper that when the ZK leader processes a lot
of session expiration events (this could be due to a long GC or a fsync
operation, etc), it marks the session as expired but does not delete the
corresponding ephemeral znode at the same time.
Meanwhile, a new session event will be fired on the kafka server and the server
will request the same ephemeral node to be created on handling the new session.
When it enters the zookeeper processing queue, this operation receives a
NodeExists error since zookeeper leader has not finished deleting that
ephemeral znode and still thinks the previous session holds it. Kafka assumes
that the NodeExists error on ephemeral node creation is ok since that is a
legitimate condition that happens during session disconnects on zookeeper.
However, a NodeExists error is only valid if the owner session id also matches
Kafka server's current zookeeper session id. The bug is that before sending a
NodeExists error, Zookeeper should check if the ephemeral node in question is
held by a session that has marked as expired.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira