[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar Reddy updated KAFKA-1451: --- Attachment: KAFKA-1451_2014-07-28_20:17:21.patch Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Assignee: Manikumar Reddy Priority: Minor Labels: newbie Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:17:21.patch h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar Reddy updated KAFKA-1451: --- Attachment: (was: KAFKA-1451_2014-07-28_20:17:21.patch) Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Assignee: Manikumar Reddy Priority: Minor Labels: newbie Attachments: KAFKA-1451.patch h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar Reddy updated KAFKA-1451: --- Attachment: KAFKA-1451.patch Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Assignee: Manikumar Reddy Priority: Minor Labels: newbie Attachments: KAFKA-1451.patch, KAFKA-1451.patch h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar Reddy updated KAFKA-1451: --- Attachment: (was: KAFKA-1451.patch) Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Assignee: Manikumar Reddy Priority: Minor Labels: newbie Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar Reddy updated KAFKA-1451: --- Attachment: KAFKA-1451_2014-07-28_20:27:32.patch Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Assignee: Manikumar Reddy Priority: Minor Labels: newbie Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar Reddy updated KAFKA-1451: --- Attachment: KAFKA-1451_2014-07-29_10:13:23.patch Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Assignee: Manikumar Reddy Priority: Minor Labels: newbie Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch, KAFKA-1451_2014-07-29_10:13:23.patch h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manikumar Reddy updated KAFKA-1451: --- Attachment: KAFKA-1451.patch Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Priority: Minor Labels: newbie Attachments: KAFKA-1451.patch h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Neha Narkhede updated KAFKA-1451: - Labels: newbie (was: ) Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Priority: Minor Labels: newbie h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciek Makowski updated KAFKA-1451: --- Description: h3. Symptoms The broker does not become available, due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. was: h3. Symptoms The broker does not become available, due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Priority: Minor h3. Symptoms The broker does not become available, due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciek Makowski updated KAFKA-1451: --- Description: h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. was: h3. Symptoms The broker does not become available, due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Priority: Minor h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a
[jira] [Updated] (KAFKA-1451) Broker stuck due to leader election race
[ https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciek Makowski updated KAFKA-1451: --- Description: h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. was: h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a different session, hence I will backoff for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$) {code} h3. Steps to Reproduce In a sinle kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely behave the same with the ZK version included in Kafka distribution) node setup: # start both zookeeper and kafka (in any order) # stop zookeeper # stop kafka # start kafka # start zookeeper h3. Likely Cause {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then triggers an election. if the deletion of ephemeral {{/controller}} node associated with previous zookeeper session of the broker happens after subscription to changes in new session, election will be invoked twice, once from {{startup}} and once from {{handleDataDeleted}}: * {{startup}}: acquire {{controllerLock}} * {{startup}}: subscribe to data changes * zookeeper: delete {{/controller}} since the session that created it timed out * {{handleDataDeleted}}: {{/controller}} was deleted * {{handleDataDeleted}}: wait on {{controllerLock}} * {{startup}}: elect -- writes {{/controller}} * {{startup}}: release {{controllerLock}} * {{handleDataDeleted}}: acquire {{controllerLock}} * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then gets into infinite loop as a result of conflict {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing znode was written from different session, which is not true in this case; it was written from the same session. That adds to the confusion. h3. Suggested Fix In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe to data changes. Broker stuck due to leader election race - Key: KAFKA-1451 URL: https://issues.apache.org/jira/browse/KAFKA-1451 Project: Kafka Issue Type: Bug Components: core Affects Versions: 0.8.1.1 Reporter: Maciek Makowski Priority: Minor h3. Symptoms The broker does not become available due to being stuck in an infinite loop while electing leader. This can be recognised by the following line being repeatedly written to server.log: {code} [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node [{version:1,brokerid:1,timestamp:1400060079108}] at /controller a while back in a