[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-07-28 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: KAFKA-1451_2014-07-28_20:17:21.patch

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:17:21.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-07-28 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: (was: KAFKA-1451_2014-07-28_20:17:21.patch)

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-07-28 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: KAFKA-1451.patch

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-07-28 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: (was: KAFKA-1451.patch)

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-07-28 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: KAFKA-1451_2014-07-28_20:27:32.patch

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-07-28 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: KAFKA-1451_2014-07-29_10:13:23.patch

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Assignee: Manikumar Reddy
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch, 
 KAFKA-1451_2014-07-29_10:13:23.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-07-26 Thread Manikumar Reddy (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manikumar Reddy updated KAFKA-1451:
---

Attachment: KAFKA-1451.patch

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Priority: Minor
  Labels: newbie
 Attachments: KAFKA-1451.patch


 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-07-17 Thread Neha Narkhede (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neha Narkhede updated KAFKA-1451:
-

Labels: newbie  (was: )

 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Priority: Minor
  Labels: newbie

 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
 {code}
 h3. Steps to Reproduce
 In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely 
 behave the same with the ZK version included in Kafka distribution) node 
 setup:
 # start both zookeeper and kafka (in any order)
 # stop zookeeper
 # stop kafka
 # start kafka
 # start zookeeper
 h3. Likely Cause
 {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
 triggers an election. if the deletion of ephemeral {{/controller}} node 
 associated with previous zookeeper session of the broker happens after 
 subscription to changes in new session, election will be invoked twice, once 
 from {{startup}} and once from {{handleDataDeleted}}:
 * {{startup}}: acquire {{controllerLock}}
 * {{startup}}: subscribe to data changes
 * zookeeper: delete {{/controller}} since the session that created it timed 
 out
 * {{handleDataDeleted}}: {{/controller}} was deleted
 * {{handleDataDeleted}}: wait on {{controllerLock}}
 * {{startup}}: elect -- writes {{/controller}}
 * {{startup}}: release {{controllerLock}}
 * {{handleDataDeleted}}: acquire {{controllerLock}}
 * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
 gets into infinite loop as a result of conflict
 {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
 znode was written from different session, which is not true in this case; it 
 was written from the same session. That adds to the confusion.
 h3. Suggested Fix
 In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe 
 to data changes.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-05-14 Thread Maciek Makowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciek Makowski updated KAFKA-1451:
---

Description: 
h3. Symptoms

The broker does not become available, due to being stuck in an infinite loop 
while electing leader. This can be recognised by the following line being 
repeatedly written to server.log:

{code}
[2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
[{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while 
back in a different session, hence I will backoff for this node to be deleted 
by Zookeeper and retry (kafka.utils.ZkUtils$)
{code}

h3. Steps to Reproduce

In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave 
the same with the ZK version included in Kafka distribution) node setup:

# start both zookeeper and kafka (in any order)
# stop zookeeper
# stop kafka
# start kafka
# start zookeeper

h3. Likely Cause

{{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
triggers an election. if the deletion of ephemeral {{/controller}} node 
associated with previous zookeeper session of the broker happens after 
subscription to changes in new session, election will be invoked twice, once 
from {{startup}} and once from {{handleDataDeleted}}:

* {{startup}}: acquire {{controllerLock}}
* {{startup}}: subscribe to data changes
* zookeeper: delete {{/controller}} since the session that created it timed out
* {{handleDataDeleted}}: {{/controller}} was deleted
* {{handleDataDeleted}}: wait on {{controllerLock}}
* {{startup}}: elect -- writes {{/controller}}
* {{startup}}: release {{controllerLock}}
* {{handleDataDeleted}}: acquire {{controllerLock}}
* {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
gets into infinite loop as a result of conflict

{{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
znode was written from different session, which is not true in this case; it 
was written from the same session. That adds to the confusion.

h3. Suggested Fix

In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to 
data changes.

  was:
h3. Symptoms

The broker does not become available, due to being stuck in an infinite loop 
while electing leader. This can be recognised by the following line being 
repeatedly written to server.log:

{code}
[2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
[{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while 
back in a different session, hence I will backoff for this node to be deleted 
by Zookeeper and retry (kafka.utils.ZkUtils$)
{code}

h3. Steps to Reproduce

In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave 
the same with the ZK version included in Kafka distribution) node setup:

# start both zookeeper and kafka (in any order)
# stop zookeeper
# stop kafka
# start kafka
# start zookeeper

h3. Likely Cause

{{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
triggers an election. if the deletion of ephemeral {{/controller}} node 
associated with previous zookeeper session of the broker happens after 
subscription to changes in new session, election will be invoked twice, once 
from {{startup}} and once from {{handleDataDeleted}}:

* {{startup}}: acquire {{controllerLock}}
* {{startup}}: subscribe to data changes
* {{handleDataDeleted}}: {{/controller}} was deleted
* {{handleDataDeleted}}: wait on {{controllerLock}}
* {{startup}}: elect -- writes {{/controller}}
* {{startup}}: release {{controllerLock}}
* {{handleDataDeleted}}: acquire {{controllerLock}}
* {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
gets into infinite loop as a result of conflict

{{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
znode was written from different session, which is not true in this case; it 
was written from the same session. That adds to the confusion.

h3. Suggested Fix

In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to 
data changes.


 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Priority: Minor

 h3. Symptoms
 The broker does not become available, due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a different session, hence I will backoff for this node to be 
 deleted by Zookeeper 

[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-05-14 Thread Maciek Makowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciek Makowski updated KAFKA-1451:
---

Description: 
h3. Symptoms

The broker does not become available due to being stuck in an infinite loop 
while electing leader. This can be recognised by the following line being 
repeatedly written to server.log:

{code}
[2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
[{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while 
back in a different session, hence I will backoff for this node to be deleted 
by Zookeeper and retry (kafka.utils.ZkUtils$)
{code}

h3. Steps to Reproduce

In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave 
the same with the ZK version included in Kafka distribution) node setup:

# start both zookeeper and kafka (in any order)
# stop zookeeper
# stop kafka
# start kafka
# start zookeeper

h3. Likely Cause

{{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
triggers an election. if the deletion of ephemeral {{/controller}} node 
associated with previous zookeeper session of the broker happens after 
subscription to changes in new session, election will be invoked twice, once 
from {{startup}} and once from {{handleDataDeleted}}:

* {{startup}}: acquire {{controllerLock}}
* {{startup}}: subscribe to data changes
* zookeeper: delete {{/controller}} since the session that created it timed out
* {{handleDataDeleted}}: {{/controller}} was deleted
* {{handleDataDeleted}}: wait on {{controllerLock}}
* {{startup}}: elect -- writes {{/controller}}
* {{startup}}: release {{controllerLock}}
* {{handleDataDeleted}}: acquire {{controllerLock}}
* {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
gets into infinite loop as a result of conflict

{{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
znode was written from different session, which is not true in this case; it 
was written from the same session. That adds to the confusion.

h3. Suggested Fix

In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to 
data changes.

  was:
h3. Symptoms

The broker does not become available, due to being stuck in an infinite loop 
while electing leader. This can be recognised by the following line being 
repeatedly written to server.log:

{code}
[2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
[{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while 
back in a different session, hence I will backoff for this node to be deleted 
by Zookeeper and retry (kafka.utils.ZkUtils$)
{code}

h3. Steps to Reproduce

In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave 
the same with the ZK version included in Kafka distribution) node setup:

# start both zookeeper and kafka (in any order)
# stop zookeeper
# stop kafka
# start kafka
# start zookeeper

h3. Likely Cause

{{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
triggers an election. if the deletion of ephemeral {{/controller}} node 
associated with previous zookeeper session of the broker happens after 
subscription to changes in new session, election will be invoked twice, once 
from {{startup}} and once from {{handleDataDeleted}}:

* {{startup}}: acquire {{controllerLock}}
* {{startup}}: subscribe to data changes
* zookeeper: delete {{/controller}} since the session that created it timed out
* {{handleDataDeleted}}: {{/controller}} was deleted
* {{handleDataDeleted}}: wait on {{controllerLock}}
* {{startup}}: elect -- writes {{/controller}}
* {{startup}}: release {{controllerLock}}
* {{handleDataDeleted}}: acquire {{controllerLock}}
* {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
gets into infinite loop as a result of conflict

{{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
znode was written from different session, which is not true in this case; it 
was written from the same session. That adds to the confusion.

h3. Suggested Fix

In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to 
data changes.


 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Priority: Minor

 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a 

[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race

2014-05-14 Thread Maciek Makowski (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciek Makowski updated KAFKA-1451:
---

Description: 
h3. Symptoms

The broker does not become available due to being stuck in an infinite loop 
while electing leader. This can be recognised by the following line being 
repeatedly written to server.log:

{code}
[2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
[{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while 
back in a different session, hence I will backoff for this node to be deleted 
by Zookeeper and retry (kafka.utils.ZkUtils$)
{code}

h3. Steps to Reproduce

In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave 
the same with the ZK version included in Kafka distribution) node setup:

# start both zookeeper and kafka (in any order)
# stop zookeeper
# stop kafka
# start kafka
# start zookeeper

h3. Likely Cause

{{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
triggers an election. if the deletion of ephemeral {{/controller}} node 
associated with previous zookeeper session of the broker happens after 
subscription to changes in new session, election will be invoked twice, once 
from {{startup}} and once from {{handleDataDeleted}}:

* {{startup}}: acquire {{controllerLock}}
* {{startup}}: subscribe to data changes
* zookeeper: delete {{/controller}} since the session that created it timed out
* {{handleDataDeleted}}: {{/controller}} was deleted
* {{handleDataDeleted}}: wait on {{controllerLock}}
* {{startup}}: elect -- writes {{/controller}}
* {{startup}}: release {{controllerLock}}
* {{handleDataDeleted}}: acquire {{controllerLock}}
* {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
gets into infinite loop as a result of conflict

{{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
znode was written from different session, which is not true in this case; it 
was written from the same session. That adds to the confusion.

h3. Suggested Fix

In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to 
data changes.

  was:
h3. Symptoms

The broker does not become available due to being stuck in an infinite loop 
while electing leader. This can be recognised by the following line being 
repeatedly written to server.log:

{code}
[2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
[{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while 
back in a different session, hence I will backoff for this node to be deleted 
by Zookeeper and retry (kafka.utils.ZkUtils$)
{code}

h3. Steps to Reproduce

In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave 
the same with the ZK version included in Kafka distribution) node setup:

# start both zookeeper and kafka (in any order)
# stop zookeeper
# stop kafka
# start kafka
# start zookeeper

h3. Likely Cause

{{ZookeeperLeaderElector}} subscribes to data changes on startup, and then 
triggers an election. if the deletion of ephemeral {{/controller}} node 
associated with previous zookeeper session of the broker happens after 
subscription to changes in new session, election will be invoked twice, once 
from {{startup}} and once from {{handleDataDeleted}}:

* {{startup}}: acquire {{controllerLock}}
* {{startup}}: subscribe to data changes
* zookeeper: delete {{/controller}} since the session that created it timed out
* {{handleDataDeleted}}: {{/controller}} was deleted
* {{handleDataDeleted}}: wait on {{controllerLock}}
* {{startup}}: elect -- writes {{/controller}}
* {{startup}}: release {{controllerLock}}
* {{handleDataDeleted}}: acquire {{controllerLock}}
* {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then 
gets into infinite loop as a result of conflict

{{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing 
znode was written from different session, which is not true in this case; it 
was written from the same session. That adds to the confusion.

h3. Suggested Fix

In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to 
data changes.


 Broker stuck due to leader election race 
 -

 Key: KAFKA-1451
 URL: https://issues.apache.org/jira/browse/KAFKA-1451
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 0.8.1.1
Reporter: Maciek Makowski
Priority: Minor

 h3. Symptoms
 The broker does not become available due to being stuck in an infinite loop 
 while electing leader. This can be recognised by the following line being 
 repeatedly written to server.log:
 {code}
 [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node 
 [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a 
 while back in a