[ 
https://issues.apache.org/jira/browse/KAFKA-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13727438#comment-13727438
 ] 

Swapnil Ghike commented on KAFKA-992:
-------------------------------------

- I think I am not completely clear why timestamp is required to be stored in 
zookeeper along with other broker info. If I am not wrong, ephemeralOwner = 
0x13ff5a4758c4a05 is the session Id. Is there a way to get it from zookeeper 
when we read the broker znode info? 
- Perhaps we should have fixed number of retries. If zookeeper cannot delete 
the znode after session expiration after sufficient amount of time, we would 
probably like to know that we are dealing with a buggy zookeeper setup.

Then this should suffice:

catch ZkNodeExistsException =>
for (numRetries) {
 if (broker.host == host && broker.port == port && sessionId == lastSessionId) {
   Thread.sleep(..)
 } else {
   throw new RuntimeException(...)
 }
}
                
> Double Check on Broker Registration to Avoid False NodeExist Exception
> ----------------------------------------------------------------------
>
>                 Key: KAFKA-992
>                 URL: https://issues.apache.org/jira/browse/KAFKA-992
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Guozhang Wang
>            Assignee: Guozhang Wang
>         Attachments: KAFKA-992.v1.patch, KAFKA-992.v2.patch
>
>
> There is a potential bug in Zookeeper that when the ZK leader processes a lot 
> of session expiration events (this could be due to a long GC or a fsync 
> operation, etc), it marks the session as expired but does not delete the 
> corresponding ephemeral znode at the same time. 
> Meanwhile, a new session event will be fired on the kafka server and the 
> server will request the same ephemeral node to be created on handling the new 
> session. When it enters the zookeeper processing queue, this operation 
> receives a NodeExists error since zookeeper leader has not finished deleting 
> that ephemeral znode and still thinks the previous session holds it. Kafka 
> assumes that the NodeExists error on ephemeral node creation is ok since that 
> is a legitimate condition that happens during session disconnects on 
> zookeeper. However, a NodeExists error is only valid if the owner session id 
> also matches Kafka server's current zookeeper session id. The bug is that 
> before sending a NodeExists error, Zookeeper should check if the ephemeral 
> node in question is held by a session that has marked as expired.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to