[ 
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150883#comment-14150883
 ] 

James Lent edited comment on KAFKA-1387 at 9/28/14 3:05 AM:
------------------------------------------------------------

I have seen this issue in our QA environment (3 ZooKeeper, 3 Kafka and several 
application specific nodes) several times now.  The problem is triggered when 
the system is under stress (high I/O and CPU load) and the ZooKeeper 
connections become unstable.  When this happens Kafka threads can get stuck 
trying to register Brokers nodes and Application threads get stuck trying to 
register Consumer nodes. One way to recover is to restart the impacted nodes.  
As an experiment I aslo tried deleting the blocking ZooKeeper nodes (hours 
later when the system was under no stress).  When I did so the 
createEphemeralPathExpectConflictHandleZKBug would process one expire, break 
out of its loop, but, then immediately reenter the loop when it tired to 
process the next expire message.  The few times I tested this approach I had to 
delete the node dozens of times before the problem would clear itself - in 
other words there were dozens of Expire messages wating to be processed. 
Obvoisuly I am looking into this issue from a configuration point of view 
(avoid the unstable connection issue), but, this Kafka error behavior concerns 
me.

I have reproduced it (somewhat artificially) in a dev environment as follows:

1) Start one ZooKeeper and on Kafka node.
2) Set a thread breakpoint in KafkaHealthCheck.java. 
{noformat}
    def handleNewSession() {
      info("re-registering broker info in ZK for broker " + brokerId)
-->   register()
      info("done re-registering broker")
      info("Subscribing to %s path to watch for new 
topics".format(ZkUtils.BrokerTopicsPath))
    }
{noformat}
3) Pause Kafka.
4) Wait for ZooKeeper to expire the first session and drop the ephemeral node.
5) Unpause Kafka.
6) Kafka reconnects with ZooKeeper, receives an Expire, and establishes a 
second session.
7) Breakpoint hit and event thread paused before handling the first Expire.
8) Pause Kafka again.
9) Wait for ZooKeeper to expire the second session and delete the ephemeral 
node (again).
10) Remove breakpoint, unpause Kafka, and finally release the event thread.
11) Kafka reconnects with ZooKeeper, receives a second Expire, and establishes 
a third session.
12) Kafka registers an ephemeral triggered by the first expire (which triggerd 
the second session), but, ZooKeeper associates it with the third Session. 
13) Kafka tries to register an an ephemeral triggered by the second expire, 
but, ZooKeeper already has a stable node.
14) Kafka assumes this node will go away soon, sleeps, and then retries.
15) The node is associcated with a valid session and threfore does not go away 
so Kafka remains stuck in the retry loop.

I have tested this with the latest code in trunk and noted the same behavior 
(the code looks pretty similar).

I have coded up a potential 0.8.1.1 patch for this issue based on the following 
principles:

# Ensure that when the node starts stale nodes are removed in main
#* For Brokers this means remove nodes with the same host name and port 
otherwise fail to start (the existing checker logic)
#* For Consumer nodes don't worry about stale nodes - the way they are named 
should prevent this from ever happening.
# In main add the initial node which should now always work with no looping 
required - direct call to createEphemeralPath
# Create a EphemeralNodeMonitor class that contains:
#* IZkDataListener
#* IZkStateListener
# The users of this class provide a path to monitor and a closure that defines 
what to do when the node is not found
# When the state listener is notifed about a new session it checks to see if 
the node is already gone:
#* Yes, call the provided function
#* No, ignore the event
# When the data listener is notified of a deletion it does the same thing
# Both the Broker and Comsumer registation use this new class in the same way 
they curently use their individual state listeners.  There only change in 
behavior is to call createEphemeralPath directly (and avoid the looping code).

Since all this work should be done in the event thread I don't think there are 
any race conditions and no other nodes should be adding these nodes (or we have 
a serious configuration issue that should have been detected at startup).

One assumption is that we will always recieve at least one more event (expire 
and/or delete) after the node is really deleted by ZooKeeper.  I think that is 
a valid assumption (ZooKeeper can't send the delete until the node is gone).  I 
wonder if we could just get away with monitoring node deletions, but, that 
seems risky.  The only change in behavior should be that if the expire is 
recieved before the node is actually deleted then event loop is not blocked and 
could process other messages while waiting for the delete event.

Note: I have not touched the leader election / contoller node code (the third 
user of the createEphemeralPathExpectConflictHandleZKBug code).  That still 
uses the looping code.  I did applied the KAFKA-1451 patch to our 0.8.1.1 build.

If there is any interest in the code I can provide a patch of what I have so 
far.  I would very much like to get feedback.  I was not sure of the protocol 
for submitting patches for comment.





was (Author: jwlent55):
I have seen this issue in our QA environment (3 ZooKeeper, 3 Kafka and several 
application specific nodes) several times now.  The problem is triggered when 
the system is under stress (high I/O and CPU load) and the ZooKeeper 
connections become unstable.  When this happens Kafka threads can get stuck 
trying to register Brokers nodes and Application threads get stuck trying to 
register Consumer nodes. One way to recover is to restart the impacted nodes.  
As an experiment I aslo tried deleting the blocking ZooKeeper nodes (hours 
later when the system was under no stress).  When I did so the 
createEphemeralPathExpectConflictHandleZKBug would process one expire, break 
out of its loop, but, then immediately reenter the loop when it tired to 
process the next expire message.  The few times I tested this approach I had to 
delete the node dozens of times before the problem would clear itself - in 
other words there were dozens of Expire messages wating to be processed. 
Obvoisuly I am looking into this issue from a configuration point of view 
(avoid the unstable connection issue), but, this Kafka error behavior concerns 
me.

I have reproduced it (somewhat artificially) in a dev environment as follows:

1) Start one ZooKeeper and on Kafka node.
2) Set a thread breakpoint in KafkaHealthCheck.java. 
{noformat}
    def handleNewSession() {
      info("re-registering broker info in ZK for broker " + brokerId)
-->   register()
      info("done re-registering broker")
      info("Subscribing to %s path to watch for new 
topics".format(ZkUtils.BrokerTopicsPath))
    }
{noformat}
3) Pause Kafka.
4) Wait for ZooKeeper to expire the first session and drop the ephemeral node.
5) Unpause Kafka.
6) Kafka reconnects with ZooKeeper, receives an Expire, and establishes a 
second session.
7) Breakpoint hit and event thread paused before handling the first Expire.
8) Pause Kafka again.
9) Wait for ZooKeeper to expire the second session and delete the ephemeral 
node (again).
10) Remove breakpoint, unpause Kafka, and finally release the event thread.
11) Kafka reconnects with ZooKeeper, receives a second Expire, and establishes 
a third session.
12) Kafka registers an ephemeral triggered by the first expire (which triggerd 
the second session), but, ZooKeeper associates it with the third Session. 
13) Kafka tries to register an an ephemeral triggered by the second expire, 
but, ZooKeeper already has a stable node.
14) Kafka assumes this node will go away soon, sleeps, and then retries.
15) The node is associcated with a valid session and threfore does not go away 
so Kafka remains stuck in the retry loop.

I have tested this with the latest code in trunk and noted the same behavior 
(the code looks pretty similar).

I have coded up a potential 0.8.1.1 patch for this issue based on the following 
principles:

# Ensure that when the node starts stale nodes are removed in main
#* For Brokers this means remove nodes with the same host name and port 
otherwise fail to start (the existing checker logic)
#* For Consumer nodes don't worry about stale nodes - the way they are named 
should prevent this from ever happening.
# In main add the initial node which should now always work with no looping 
required - direct call to createEphemeralPath
# Create a EphemeralNodeMonitor class that contains:
#* IZkDataListener
#* IZkStateListener
# The users of this class provide a path to monitor and ia closure that defines 
what to do when the node is not found
# When the state listener is notifed about a new session it checks to see if 
the node is already gone:
#* Yes, call the provided function
#* No, ignore the event
# When the data listener is notified of a deletion it does the same thing
# Both the Broker and Comsumer registation use this new class in the same way 
they curently use their individual state listeners.  There only change in 
behavior is to call createEphemeralPath directly (and avoid the looping code).

Since all this work should be done in the event thread I don't think there are 
any race conditions and no other nodes should be adding these nodes (or we have 
a serious configuration issue that should have been detected at startup).

One assumption is that we will always recieve at least one more event (expire 
and/or delete) after the node is really deleted by ZooKeeper.  I think that is 
a valid assumption (ZooKeeper can't send the delete until the node is gone).  I 
wonder if we could just get away with monitoring node deletions, but, that 
seems risky.  The only change in behavior should be that if the expire is 
recieved before the node is actually deleted then event loop is not blocked and 
could process other messages while waiting for the delete event.

Note: I have not touched the leader election / contoller node code (the third 
user of the createEphemeralPathExpectConflictHandleZKBug code).  That still 
uses the looping code.  I did applied the KAFKA-1451 patch to our 0.8.1.1 build.

If there is any interest in the code I can provide a patch of what I have so 
far.  I would very much like to get feedback.  I was not sure of the protocol 
for submitting patches for comment.




> Kafka getting stuck creating ephemeral node it has already created when two 
> zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1387
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1387
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Fedor Korotkiy
>
> Kafka broker re-registers itself in zookeeper every time handleNewSession() 
> callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
>  
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by 
> the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using 
> createEphemeralPathExpectConflictHandleZKBug() function. But the path is 
> already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting 
> stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the 
> following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using 
> Ctrl-Z.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to