[jira] [Commented] (KAFKA-2437) Controller does not handle zk node deletion correctly.

2015-09-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728351#comment-14728351
 ] 

ASF GitHub Bot commented on KAFKA-2437:
---

Github user asfgit closed the pull request at:

https://github.com/apache/kafka/pull/189


> Controller does not handle zk node deletion correctly.
> --
>
> Key: KAFKA-2437
> URL: https://issues.apache.org/jira/browse/KAFKA-2437
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
>
> We see this issue occasionally. The symptom is that when /controller path got 
> deleted, the old controller does not resign so we end up having more than one 
> controller in the cluster (although the requests from controller with old 
> epoch will not be accepted). After checking zookeeper watcher by using wchp, 
> it looks the zookeeper session who created the /controller path does not have 
> a watcher on /controller. That causes the old controller not resigning. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2437) Controller does not handle zk node deletion correctly.

2015-09-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728309#comment-14728309
 ] 

ASF GitHub Bot commented on KAFKA-2437:
---

GitHub user becketqin opened a pull request:

https://github.com/apache/kafka/pull/189

KAFKA-2437: Fix ZookeeperLeaderElector to handle node deletion correctly.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/becketqin/kafka KAFKA-2437

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/kafka/pull/189.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #189


commit 11d9fd6595932553e138a3c3094322ebd9170d6c
Author: Jiangjie Qin 
Date:   2015-09-03T00:41:26Z

KAFKA-2437: Fix ZookeeperLeaderElector to handle node deletion correctly.




> Controller does not handle zk node deletion correctly.
> --
>
> Key: KAFKA-2437
> URL: https://issues.apache.org/jira/browse/KAFKA-2437
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
>
> We see this issue occasionally. The symptom is that when /controller path got 
> deleted, the old controller does not resign so we end up having more than one 
> controller in the cluster (although the requests from controller with old 
> epoch will not be accepted). After checking zookeeper watcher by using wchp, 
> it looks the zookeeper session who created the /controller path does not have 
> a watcher on /controller. That causes the old controller not resigning. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (KAFKA-2437) Controller does not handle zk node deletion correctly.

2015-09-02 Thread Jiangjie Qin (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-2437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728248#comment-14728248
 ] 

Jiangjie Qin commented on KAFKA-2437:
-

Debugged with [~jjkoshy] and found the following root cause.

zkClient determine whether to find handleDataChanged() or handledDataDeleted() 
in the following way. When receive event from zookeeper, it tries to read the 
data from the watched path. If the path does not exist any more, 
handledDataDeleted() will be fired. Otherwise, handleDataChange() will be fired.

When we delete /controller path. zkClient watcher will receive zk event, but 
before zkClient read data from the watched path, the path got created again by 
another broker. In this case, only handleDataChange() will fire, i.e. a broker 
will miss a node deletion event. If the broker missed the node deletion event 
happen to be the old controller, it will not resign and the cluster will end up 
with more than one controller.

> Controller does not handle zk node deletion correctly.
> --
>
> Key: KAFKA-2437
> URL: https://issues.apache.org/jira/browse/KAFKA-2437
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
>
> We see this issue occasionally. The symptom is that when /controller path got 
> deleted, the old controller does not resign so we end up having more than one 
> controller in the cluster (although the requests from controller with old 
> epoch will not be accepted). After checking zookeeper watcher by using wchp, 
> it looks the zookeeper session who created the /controller path does not have 
> a watcher on /controller. That causes the old controller not resigning. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)