[ https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14711838#comment-14711838 ]
Guozhang Wang commented on KAFKA-1387: -------------------------------------- Thanks [~fpj], thanks for the patch. Here are some high-level comments: 1. Will the mixing usage of ZK directly and ZkClient together violate ordering? AFAIK ZkClient orders all events fired by watchers and hand them to the user callbacks one-by-one, if we use ZK's Watcher directly will its callback be called out-of-order with other events? 2. If we get a Code.OK in CreateCallback, do we still need to trigger a ZooKeeper.exist with ExistsCallback again? 3. For the consumer / server registration case particularly, we tries to handle parent path creation in ZkUtils.makeSurePersistentPathExists, so I feel we should expose the problem that parent path does not exist yet instead trying to hide it in createRecursive. > Kafka getting stuck creating ephemeral node it has already created when two > zookeeper sessions are established in a very short period of time > --------------------------------------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-1387 > URL: https://issues.apache.org/jira/browse/KAFKA-1387 > Project: Kafka > Issue Type: Bug > Affects Versions: 0.8.1.1 > Reporter: Fedor Korotkiy > Assignee: Flavio Junqueira > Priority: Blocker > Labels: newbie, patch, zkclient-problems > Attachments: KAFKA-1387.patch, kafka-1387.patch > > > Kafka broker re-registers itself in zookeeper every time handleNewSession() > callback is invoked. > https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala > > Now imagine the following sequence of events. > 1) Zookeeper session reestablishes. handleNewSession() callback is queued by > the zkClient, but not invoked yet. > 2) Zookeeper session reestablishes again, queueing callback second time. > 3) First callback is invoked, creating /broker/[id] ephemeral path. > 4) Second callback is invoked and it tries to create /broker/[id] path using > createEphemeralPathExpectConflictHandleZKBug() function. But the path is > already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting > stuck in the infinite loop. > Seems like controller election code have the same issue. > I'am able to reproduce this issue on the 0.8.1 branch from github using the > following configs. > # zookeeper > tickTime=10 > dataDir=/tmp/zk/ > clientPort=2101 > maxClientCnxns=0 > # kafka > broker.id=1 > log.dir=/tmp/kafka > zookeeper.connect=localhost:2101 > zookeeper.connection.timeout.ms=100 > zookeeper.sessiontimeout.ms=100 > Just start kafka and zookeeper and then pause zookeeper several times using > Ctrl-Z. -- This message was sent by Atlassian JIRA (v6.3.4#6332)