[ https://issues.apache.org/jira/browse/KAFKA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guozhang Wang updated KAFKA-992: -------------------------------- Attachment: KAFKA-992.v11.patch Forgot to pull before rebase, the new version apply to latest 0.8 now. > Double Check on Broker Registration to Avoid False NodeExist Exception > ---------------------------------------------------------------------- > > Key: KAFKA-992 > URL: https://issues.apache.org/jira/browse/KAFKA-992 > Project: Kafka > Issue Type: Bug > Reporter: Neha Narkhede > Assignee: Guozhang Wang > Attachments: KAFKA-992.v10.patch, KAFKA-992.v11.patch, > KAFKA-992.v1.patch, KAFKA-992.v2.patch, KAFKA-992.v3.patch, > KAFKA-992.v4.patch, KAFKA-992.v5.patch, KAFKA-992.v6.patch, > KAFKA-992.v7.patch, KAFKA-992.v8.patch, KAFKA-992.v9.patch > > > The current behavior of zookeeper for ephemeral nodes is that session > expiration and ephemeral node deletion is not an atomic operation. > The side-effect of the above zookeeper behavior in Kafka, for certain corner > cases, is that ephemeral nodes can be lost even if the session is not > expired. The sequence of events that can lead to lossy ephemeral nodes is as > follows - > 1. The session expires on the client, it assumes the ephemeral nodes are > deleted, so it establishes a new session with zookeeper and tries to > re-create the ephemeral nodes. > 2. However, when it tries to re-create the ephemeral node,zookeeper throws > back a NodeExists error code. Now this is legitimate during a session > disconnect event (since zkclient automatically retries the > operation and raises a NodeExists error). Also by design, Kafka server > doesn't have multiple zookeeper clients create the same ephemeral node, so > Kafka server assumes the NodeExists is normal. > 3. However, after a few seconds zookeeper deletes that ephemeral node. So > from the client's perspective, even though the client has a new valid > session, its ephemeral node is gone. > This behavior is triggered due to very long fsync operations on the zookeeper > leader. When the leader wakes up from such a long fsync operation, it has > several sessions to expire. And the time between the session expiration and > the ephemeral node deletion is magnified. Between these 2 operations, a > zookeeper client can issue a ephemeral node creation operation, that could've > appeared to have succeeded, but the leader later deletes the ephemeral node > leading to permanent ephemeral node loss from the client's perspective. > Thread from zookeeper mailing list: > http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira