[
https://issues.apache.org/jira/browse/KAFKA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14593515#comment-14593515
]
Raghav commented on KAFKA-1451:
-------------------------------
Hit this issue on version 0.8.2.1 when twiiterstream generate the large data i
have one topic with two broker and two partition
[2015-06-19 20:35:10,141] INFO I wrote this conflicted ephemeral node
[{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}]
at /brokers/ids/2 a while back in a different session, hence I will backoff
for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
[2015-06-19 20:35:16,246] INFO conflict in /brokers/ids/2 data:
{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}
stored data:
{"jmx_port":10000,"timestamp":"1434726044184","host":"localhost.localdomain","version":1,"port":9093}
(kafka.utils.ZkUtils$)
[2015-06-19 20:35:16,796] INFO I wrote this conflicted ephemeral node
[{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}]
at /brokers/ids/2 a while back in a different session, hence I will backoff
for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
[2015-06-19 20:35:22,965] INFO conflict in /brokers/ids/2 data:
{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}
stored data:
{"jmx_port":10000,"timestamp":"1434726044184","host":"localhost.localdomain","version":1,"port":9093}
(kafka.utils.ZkUtils$)
[2015-06-19 20:35:22,967] INFO I wrote this conflicted ephemeral node
[{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}]
at /brokers/ids/2 a while back in a different session, hence I will backoff
for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
[2015-06-19 20:35:29,159] INFO conflict in /brokers/ids/2 data:
{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}
stored data:
{"jmx_port":10000,"timestamp":"1434726044184","host":"localhost.localdomain","version":1,"port":9093}
(kafka.utils.ZkUtils$)
[2015-06-19 20:35:29,161] INFO I wrote this conflicted ephemeral node
[{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}]
at /brokers/ids/2 a while back in a different session, hence I will backoff
for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
[2015-06-19 20:35:35,219] INFO conflict in /brokers/ids/2 data:
{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}
stored data:
{"jmx_port":10000,"timestamp":"1434726044184","host":"localhost.localdomain","version":1,"port":9093}
(kafka.utils.ZkUtils$)
[2015-06-19 20:35:35,221] INFO I wrote this conflicted ephemeral node
[{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}]
at /brokers/ids/2 a while back in a different session, hence I will backoff
for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
[2015-06-19 20:35:41,338] INFO conflict in /brokers/ids/2 data:
{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}
stored data:
{"jmx_port":10000,"timestamp":"1434726044184","host":"localhost.localdomain","version":1,"port":9093}
(kafka.utils.ZkUtils$)
[2015-06-19 20:35:42,208] INFO I wrote this conflicted ephemeral node
[{"jmx_port":10000,"timestamp":"1434726183806","host":"localhost.localdomain","version":1,"port":9093}]
at /brokers/ids/2 a while back in a different session, hence I will backoff
for this node to be deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
> Broker stuck due to leader election race
> -----------------------------------------
>
> Key: KAFKA-1451
> URL: https://issues.apache.org/jira/browse/KAFKA-1451
> Project: Kafka
> Issue Type: Bug
> Components: core
> Affects Versions: 0.8.1.1
> Reporter: Maciek Makowski
> Assignee: Manikumar Reddy
> Priority: Minor
> Labels: newbie
> Fix For: 0.8.2.0
>
> Attachments: KAFKA-1451.patch, KAFKA-1451_2014-07-28_20:27:32.patch,
> KAFKA-1451_2014-07-29_10:13:23.patch
>
>
> h3. Symptoms
> The broker does not become available due to being stuck in an infinite loop
> while electing leader. This can be recognised by the following line being
> repeatedly written to server.log:
> {code}
> [2014-05-14 04:35:09,187] INFO I wrote this conflicted ephemeral node
> [{"version":1,"brokerid":1,"timestamp":"1400060079108"}] at /controller a
> while back in a different session, hence I will backoff for this node to be
> deleted by Zookeeper and retry (kafka.utils.ZkUtils$)
> {code}
> h3. Steps to Reproduce
> In a single kafka 0.8.1.1 node, single zookeeper 3.4.6 (but will likely
> behave the same with the ZK version included in Kafka distribution) node
> setup:
> # start both zookeeper and kafka (in any order)
> # stop zookeeper
> # stop kafka
> # start kafka
> # start zookeeper
> h3. Likely Cause
> {{ZookeeperLeaderElector}} subscribes to data changes on startup, and then
> triggers an election. if the deletion of ephemeral {{/controller}} node
> associated with previous zookeeper session of the broker happens after
> subscription to changes in new session, election will be invoked twice, once
> from {{startup}} and once from {{handleDataDeleted}}:
> * {{startup}}: acquire {{controllerLock}}
> * {{startup}}: subscribe to data changes
> * zookeeper: delete {{/controller}} since the session that created it timed
> out
> * {{handleDataDeleted}}: {{/controller}} was deleted
> * {{handleDataDeleted}}: wait on {{controllerLock}}
> * {{startup}}: elect -- writes {{/controller}}
> * {{startup}}: release {{controllerLock}}
> * {{handleDataDeleted}}: acquire {{controllerLock}}
> * {{handleDataDeleted}}: elect -- attempts to write {{/controller}} and then
> gets into infinite loop as a result of conflict
> {{createEphemeralPathExpectConflictHandleZKBug}} assumes that the existing
> znode was written from different session, which is not true in this case; it
> was written from the same session. That adds to the confusion.
> h3. Suggested Fix
> In {{ZookeeperLeaderElector.startup}} first run {{elect}} and then subscribe
> to data changes.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)