[jira] [Comment Edited] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

Joe Stein (JIRA) Tue, 05 Aug 2014 18:16:32 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087063#comment-14087063
 ]


Joe Stein edited comment on KAFKA-1387 at 8/6/14 1:15 AM:
----------------------------------------------------------

[~junrao] I tested on trunk and it is much worse now.

instead of looping on the /controller node (like it was before) ... node 3 
actually overwrote/stole the /brokers/ids/2 (doing a get before had it as 
192.168.30.1 and after it is 192.168.30.3)

so now i have a situation where I have two running broker servers, each with 
the same broker id running (2), server 3 is the ("active") broker with all the 
topics being created on it and failing requests for producing and consuming 
(because all the data is on server 2 but that is not advertised).... and server 
2 is still the controller handling preferred leader election, etc.

what is weird is broker.id = 2 was running already.  I started up broker.id=1 
and another broker.id=2 at the same time






was (Author: joestein):
[~junrao] I tested on trunk and it is much worse now.

instead of looping on the /controller node (like it was before) ... node 3 
actually overwrote/stole the /brokers/ids/2 (doing a get before had it as 
192.168.30.1 and after it is 192.168.30.3)

so now i have a situation where I have two running broker servers, each with 
the same broker id running (2), server 3 is the ("active") broker with all the 
topics being created on it and failing requests for producing and consuming 
(because all the data is on server 1 but that is not advertised).... and server 
1 is still the controller handling preferred leader election, etc.

what is weird is broker.id = 2 was running already.  I started up broker.id=1 
and another broker.id=2 at the same time





> Kafka getting stuck creating ephemeral node it has already created when two 
> zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1387
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1387
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Fedor Korotkiy
>
> Kafka broker re-registers itself in zookeeper every time handleNewSession() 
> callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
>  
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
> the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using 
> createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
> already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
> stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the 
> following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using 
> Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (KAFKA-1387) Kafka getting stuck creating ephemeral node it has already created when two zookeeper sessions are established in a very short period of time

Reply via email to