Zhuo Liu created STORM-1114:
-------------------------------

             Summary: Racing condition in trident zookeeper zk-node 
create/delete
                 Key: STORM-1114
                 URL: https://issues.apache.org/jira/browse/STORM-1114
             Project: Apache Storm
          Issue Type: Bug
          Components: storm-core
            Reporter: Zhuo Liu
            Assignee: Zhuo Liu


In production for some topology, we met the bug that some workers are trying to 
create a zk-node that is already existent or delete a zk node that has already 
been deleted. This causes the worker process to die.
 
We dissect the problem and figure out that there exists racing condition in 
trident TransactionalState's zk-node create and delete codes.
This has to be fixed.

failure stack trace in worker.log:
{noformat}
Caused by: 
org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
 KeeperErrorCode = NodeExists for /ignoreStoredMetadata
        at 
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:676)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:660)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:656)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:441)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:431)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:239)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$3.forPath(CreateBuilderImpl.java:193)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:100)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        at 
storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115)
 ~[storm-core-0.10.1.y.jar:0.10.1.y]
        ... 9 more
2015-10-14 18:10:43.786 b.s.util [ERROR] Halting process: ("Worker died")
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to