[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14694052#comment-14694052
 ] 

James Lent commented on KAFKA-1387:
-----------------------------------

After refreshing my memory of this issue I was unable to come up with any new 
ideas for how to create an automated test case for the issue.  I was only able 
to reproduce this issue in my dev environment using the cumbersome manual 
process I outlined in my Sept 27 comment.

My question posted to the zookeeper-user mailing list regarding the validity of 
the key assumption of the patch logic generated no feedback.

We have been using the patch I provided with Kafka 0.8.1.1 for almost a year 
now.  We have not seen a re-occurrence of the hung ephemeral connection issue 
since then.  Since the problem was intermittent and only triggered when the 
system was unstable, this may or may not be due to the presence of the patch.

There was one an NPE issue found during test in March when our application code 
changed and in certain cases tried to close a Connector that had never been 
fully started.  That was fixed as follows:

{noformat}
Index: core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala
===================================================================
--- core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
(revision 73668)
+++ core/src/main/scala/kafka/consumer/ZookeeperConsumerConnector.scala 
(revision 73669)
@@ -162,7 +162,9 @@
       if (canShutdown) {
         info("ZKConsumerConnector shutting down")
 
-        consumerNodeMonitor.close()
+        if (consumerNodeMonitor != null) {
+          consumerNodeMonitor.close()
+        }
         
         if (wildcardTopicWatcher != null)
           wildcardTopicWatcher.shutdown()
{noformat}

Not sure any of this was of much help, but, I would be happy to try to answer 
any questions regarding the patch logic and/or update it based on your comments.

> Kafka getting stuck creating ephemeral node it has already created when two 
> zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1387
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1387
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1.1
>            Reporter: Fedor Korotkiy
>            Priority: Blocker
>              Labels: newbie, patch, zkclient-problems
>         Attachments: kafka-1387.patch
>
>
> Kafka broker re-registers itself in zookeeper every time handleNewSession() 
> callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
>  
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
> the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using 
> createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
> already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
> stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the 
> following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using 
> Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to