[
https://issues.apache.org/jira/browse/KAFKA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14150883#comment-14150883
]
James Lent edited comment on KAFKA-1387 at 9/28/14 2:57 AM:
------------------------------------------------------------
I have seen this issue in our QA environment (3 ZooKeeper, 3 Kafka and several
application specific nodes) several times now. The problem is triggered when
the system is under stress (high I/O and CPU load) and the ZooKeeper
connections become unstable. When this happens Kafka threads can get stuck
trying to register Brokers nodes and Application threads get stuck trying to
register Consumer nodes. One way to recover is to restart the impacted nodes.
As an experiment I aslo tried deleting the blocking ZooKeeper nodes (hours
later when the system was under no stress). When I did so the
createEphemeralPathExpectConflictHandleZKBug would rocess one expire, break out
of its loop, but, then immediately reenter it whenit tired to process the next
expire message. The few times I tested this approach I had to delete the node
dozens of times before the problem would clear itself - in other words there
were dozens of Expire messages wating to be processed. Obvoisuly I am looking
into this issue from a configuration point of view (avoid the unstable
connection issue), but, this Kafka error behavior concerns me.
I have reproduced it (somewhat artificially) in a dev environment as follows:
1) Start one ZooKeeper and on Kafka node.
2) Set a thread breakpoint in KafkaHealthCheck.java.
{noformat}
def handleNewSession() {
info("re-registering broker info in ZK for broker " + brokerId)
--> register()
info("done re-registering broker")
info("Subscribing to %s path to watch for new
topics".format(ZkUtils.BrokerTopicsPath))
}
{noformat}
3) Pause Kafka.
4) Wait for ZooKeeper to expire the first session and drop the ephemeral node.
5) Unpause Kafka.
6) Kafka reconnects with ZooKeeper, receives an Expire, and establishes a
second session.
7) Breakpoint hit and event thread paused before handling the first Expire.
8) Pause Kafka again.
9) Wait for ZooKeeper to expire the second session and delete the ephemeral
node (again).
10) Remove breakpoint, unpause Kafka, and finally release the event thread.
11) Kafka reconnects with ZooKeeper, receives a second Expire, and establishes
a third session.
12) Kafka registers an ephemeral triggered by the first expire (which triggerd
the second session), but, ZooKeeper associates it with the third Session.
13) Kafka tries to register an an ephemeral triggered by the second expire,
but, ZooKeeper already has a stable node.
14) Kafka assumes this node will go away soon, sleeps, and then retries.
15) The node is associcated with a valid session and threfore does not go away
so Kafka remains stuck in the retry loop.
I have tested this with the latest code in trunk and noted the same behavior
(the code looks pretty similar).
I have coded up a potential 0.8.1.1 patch for this issue based on the following
principles:
# Ensure that when the node starts stale nodes are removed in main
#* For Brokers this means remove nodes with the same host name and port
otherwise fail to start (the existing checker logic)
#* For Consumer nodes don't worry about stale nodes - the way they are named
should prevent this from ever happening.
# In main add the initial node which should now always work with no looping
required - direct call to createEphemeralPath
# Create a EphemeralNodeMonitor class that contains:
#* IZkDataListener
#* IZkStateListener
# The users of this class provide a path to monitor and in a closure that
defines what to do when the node is not found
# When the state listener is notifed about a new session it checks to see if
the node is already gone:
#* Yes, call the provided function
#* No, ignore the event
# When the data listener is notified of a deletion it does the same thing
# Both the Broker and Comsumer registation use this new class in the same way
they curently use their individual state listeners. There only change in
behavior is to call createEphemeralPath directly (and avoid the looping code).
Since all this work should be done in the event thread I don't think there are
any race conditions and no other nodes should be adding these nodes (or we have
a serious configuration issue that should have been detected at startup).
One assumption is that we will always recieve at least one more event (expire
and/or delete) after the node is really deleted by ZooKeeper. I think that is
a valid assumption (ZooKeeper can't send the delete until the node is gone). I
wonder if we could just get away with monitoring node deletions, but, that
seems risky. The only change in behavior should be that if the expire is
recieved before the node is actually deleted then event loop is not blocked and
could process other messages while waiting for the delete event.
Note: I have not touched the leader election / contoller node code (the third
user of the createEphemeralPathExpectConflictHandleZKBug code). That still
uses the looping code. I did applied the KAFKA-1451 patch to our 0.8.1.1 build.
If there is any interest in the code I can provide a patch of what I have so
far. I would very much like to get feedback. I was not sure of the protocol
for submitting patches for comment.
was (Author: jwlent55):
I have seen this issue in our QA environment (3 ZooKeeper, 3 Kafka and several
application specific nodes) several times now. The problem is triggered when
the system is under stress (high I/O and CPU load) and the ZooKeeper
connections become unstable. When this happens Kafka threads can get stuck
trying to register Brokers nodes and Application threads get stuck trying to
register Consumer nodes. One way to recover is to restart the impacted nodes.
As an experiment I aslo tried deleting the blocking ZooKeeper nodes (hours
later when the system was under no stress). When I did so the
createEphemeralPathExpectConflictHandleZKBug would rocess one expire, break out
of its loop, but, then immediately reenter it whenit tired to process the next
expire message. The few times I tested this approach I had to delete the node
dozens of times before the problem would clear itself - in other words there
were dozens of Expire messages wating to be processed. Obvoisuly I am looking
into this issue from a configuration point of view (avoid the unstable
connection issue), but, this Kafka error behavior concerns me.
I have reproduced it (somewhat artificially) in a dev environment as follows:
1) Start one ZooKeeper and on Kafka node.
2) Set a thread breakpoint in KafkaHealthCheck.java.
{noformat}
def handleNewSession() {
info("re-registering broker info in ZK for broker " + brokerId)
--> register()
info("done re-registering broker")
info("Subscribing to %s path to watch for new
topics".format(ZkUtils.BrokerTopicsPath))
}
{noformat}
3) Pause Kafka.
4) Wait for ZooKeeper to expire the first session and drop the ephemeral node.
5) Unpause Kafka.
6) Kafka reconnects with ZooKeeper, receives an Expire, and establishes a
second session.
7) Breakpoint hit and event thread paused before handling the first Expire.
8) Pause Kafka again.
9) Wait for ZooKeeper to expire the second session and delete the ephemeral
node (again).
10) Remove breakpoint, unpause Kafka, and finally release the event thread.
11) Kafka reconnects with ZooKeeper, receives a second Expire, and establishes
a third session.
12) Kafka registers an ephemeral triggered by the first expire (which triggerd
the second session), but, ZooKeeper associates it with the third Session.
13) Kafka tries to register an an ephemeral triggered by the second expire,
but, ZooKeeper already has a stable node.
14) Kafka assumes this node will go away soon, sleeps, and then retries.
15) The node is associcated with a valid session and threfore does not go away
so Kafka remains stuck in the retry loop.
I have tested this with the latest code in trunk and noted the same behavior
(the code looks pretty similar).
I have coded up a potential 0.8.1.1 patch for this issue based on the following
principles:
1) Ensure that when the node starts stale nodes are removed in main
a) For Brokers this means remove nodes with the same host name and port
otherwise fail to start (the existing checker logic)
b) For Consumer nodes don't worry about stale nodes - the way they are named
should prevent this from ever happening.
2) In main add the initial node which should now always work with no looping
required - direct call to createEphemeralPath
3) Create a EphemeralNodeMonitor class that contains:
a) IZkDataListener
b) IZkStateListener
4) The users of this class provide a path to monitor and in a closure that
defines what to do when the node is not found
5) When the state listener is notifed about a new session it checks to see if
the node is already gone:
a) Yes, call the provided function
b) No, ignore the event
6) When the data listener is notified of a deletion it does the same thing
7) Both the Broker and Comsumer registation use this new class in the same way
they curently use their individual state listeners. There only change in
behavior is to call createEphemeralPath directly (and avoid the looping code).
Since all this work should be done in the event thread I don't think there are
any race conditions and no other nodes should be adding these nodes (or we have
a serious configuration issue that should have been detected at startup).
One assumption is that we will always recieve at least one more event (expire
and/or delete) after the node is really deleted by ZooKeeper. I think that is
a valid assumption (ZooKeeper can't send the delete until the node is gone). I
wonder if we could just get away with monitoring node deletions, but, that
seems risky. The only change in behavior should be that if the expire is
recieved before the node is actually deleted then event loop is not blocked and
could process other messages while waiting for the delete event.
Note: I have not touched the leader election / contoller node code (the third
user of the createEphemeralPathExpectConflictHandleZKBug code). That still
uses the looping code. I did applied the KAFKA-1451 patch to our 0.8.1.1 build.
If there is any interest in the code I can provide a patch of what I have so
far. I would very much like to get feedback. I was not sure of the protocol
for submitting patches for comment.
> Kafka getting stuck creating ephemeral node it has already created when two
> zookeeper sessions are established in a very short period of time
> ---------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: KAFKA-1387
> URL: https://issues.apache.org/jira/browse/KAFKA-1387
> Project: Kafka
> Issue Type: Bug
> Reporter: Fedor Korotkiy
>
> Kafka broker re-registers itself in zookeeper every time handleNewSession()
> callback is invoked.
> https://github.com/apache/kafka/blob/0.8.1/core/src/main/scala/kafka/server/KafkaHealthcheck.scala
>
> Now imagine the following sequence of events.
> 1) Zookeeper session reestablishes. handleNewSession() callback is queued by
> the zkClient, but not invoked yet.
> 2) Zookeeper session reestablishes again, queueing callback second time.
> 3) First callback is invoked, creating /broker/[id] ephemeral path.
> 4) Second callback is invoked and it tries to create /broker/[id] path using
> createEphemeralPathExpectConflictHandleZKBug() function. But the path is
> already exists, so createEphemeralPathExpectConflictHandleZKBug() is getting
> stuck in the infinite loop.
> Seems like controller election code have the same issue.
> I'am able to reproduce this issue on the 0.8.1 branch from github using the
> following configs.
> # zookeeper
> tickTime=10
> dataDir=/tmp/zk/
> clientPort=2101
> maxClientCnxns=0
> # kafka
> broker.id=1
> log.dir=/tmp/kafka
> zookeeper.connect=localhost:2101
> zookeeper.connection.timeout.ms=100
> zookeeper.sessiontimeout.ms=100
> Just start kafka and zookeeper and then pause zookeeper several times using
> Ctrl-Z.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)