[ 
https://issues.apache.org/jira/browse/KAFKA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-992:
--------------------------------

    Description: 
The current behavior of zookeeper for ephemeral nodes is that session 
expiration and ephemeral node deletion is not an atomic operation. 

The side-effect of the above zookeeper behavior in Kafka, for certain corner 
cases, is that ephemeral nodes can be lost even if the session is not expired. 
The sequence of events that can lead to lossy ephemeral nodes is as follows -

1. The session expires on the client, it assumes the ephemeral nodes are 
deleted, so it establishes a new session with zookeeper and tries to re-create 
the ephemeral nodes. 
2. However, when it tries to re-create the ephemeral node,zookeeper throws back 
a NodeExists error code. Now this is legitimate during a session disconnect 
event (since zkclient automatically retries the
operation and raises a NodeExists error). Also by design, Kafka server doesn't 
have multiple zookeeper clients create the same ephemeral node, so Kafka server 
assumes the NodeExists is normal. 
3. However, after a few seconds zookeeper deletes that ephemeral node. So from 
the client's perspective, even though the client has a new valid session, its 
ephemeral node is gone.

This behavior is triggered due to very long fsync operations on the zookeeper 
leader. When the leader wakes up from such a long fsync operation, it has 
several sessions to expire. And the time between the session expiration and the 
ephemeral node deletion is magnified. Between these 2 operations, a zookeeper 
client can issue a ephemeral node creation operation, that could've appeared to 
have succeeded, but the leader later deletes the ephemeral node leading to 
permanent ephemeral node loss from the client's perspective. 

Thread from zookeeper mailing list: 
http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results

  was:
There is a potential bug in Zookeeper that when the ZK leader processes a lot 
of session expiration events (this could be due to a long GC or a fsync 
operation, etc), it marks the session as expired but does not delete the 
corresponding ephemeral znode at the same time. 

Meanwhile, a new session event will be fired on the kafka server and the server 
will request the same ephemeral node to be created on handling the new session. 
When it enters the zookeeper processing queue, this operation receives a 
NodeExists error since zookeeper leader has not finished deleting that 
ephemeral znode and still thinks the previous session holds it. Kafka assumes 
that the NodeExists error on ephemeral node creation is ok since that is a 
legitimate condition that happens during session disconnects on zookeeper. 
However, a NodeExists error is only valid if the owner session id also matches 
Kafka server's current zookeeper session id. The bug is that before sending a 
NodeExists error, Zookeeper should check if the ephemeral node in question is 
held by a session that has marked as expired.

       Reporter: Neha Narkhede  (was: Guozhang Wang)
    
> Double Check on Broker Registration to Avoid False NodeExist Exception
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-992
>                 URL: https://issues.apache.org/jira/browse/KAFKA-992
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Neha Narkhede
>            Assignee: Guozhang Wang
>         Attachments: KAFKA-992.v1.patch, KAFKA-992.v2.patch
>
>
> The current behavior of zookeeper for ephemeral nodes is that session 
> expiration and ephemeral node deletion is not an atomic operation. 
> The side-effect of the above zookeeper behavior in Kafka, for certain corner 
> cases, is that ephemeral nodes can be lost even if the session is not 
> expired. The sequence of events that can lead to lossy ephemeral nodes is as 
> follows -
> 1. The session expires on the client, it assumes the ephemeral nodes are 
> deleted, so it establishes a new session with zookeeper and tries to 
> re-create the ephemeral nodes. 
> 2. However, when it tries to re-create the ephemeral node,zookeeper throws 
> back a NodeExists error code. Now this is legitimate during a session 
> disconnect event (since zkclient automatically retries the
> operation and raises a NodeExists error). Also by design, Kafka server 
> doesn't have multiple zookeeper clients create the same ephemeral node, so 
> Kafka server assumes the NodeExists is normal. 
> 3. However, after a few seconds zookeeper deletes that ephemeral node. So 
> from the client's perspective, even though the client has a new valid 
> session, its ephemeral node is gone.
> This behavior is triggered due to very long fsync operations on the zookeeper 
> leader. When the leader wakes up from such a long fsync operation, it has 
> several sessions to expire. And the time between the session expiration and 
> the ephemeral node deletion is magnified. Between these 2 operations, a 
> zookeeper client can issue a ephemeral node creation operation, that could've 
> appeared to have succeeded, but the leader later deletes the ephemeral node 
> leading to permanent ephemeral node loss from the client's perspective. 
> Thread from zookeeper mailing list: 
> http://zookeeper.markmail.org/search/?q=Zookeeper+3.3.4#query:Zookeeper%203.3.4%20date%3A201307%20+page:1+mid:zma242a2qgp6gxvx+state:results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to