[ 
https://issues.apache.org/jira/browse/STORM-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368730#comment-15368730
 ] 

happylu edited comment on STORM-1940 at 7/8/16 11:41 PM:
---------------------------------------------------------

Not very easy, we can only see it each a month, usually follow with ZK 
reconnect. I think you can reproduce it by hunge the code after ser node 
created, then disconnect the network to make ZK timeout. Or can we directly add 
the check of the "path+ser" before create it?


was (Author: happylu):
Not very easy, we can only see it each a month, usually compare with ZK 
reconnect. I think you can reproduce it by hunge the code after ser node 
created, then disconnect the network to make ZK timeout. Or can we directly add 
the check with the "path+ser" before create it?

> Storm Topo is auto re-balance after ZK RECONNECTED
> --------------------------------------------------
>
>                 Key: STORM-1940
>                 URL: https://issues.apache.org/jira/browse/STORM-1940
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 1.0.1
>            Reporter: happylu
>            Priority: Critical
>         Attachments: others.zip, worker1.zip, worker2.zip
>
>
> I have a Topo with 2 workers at 2 Vm, while ZK RECONNECTED, Storm Topo will 
> be auto-reblance. 
> The log show NodeExists for /meta/712285. I guess it cause by: After 
> reconnect successfully, TridentSpoutCoordinator create this node again, but 
> this node is already created before the reconnect.
>  Can we check if node exist first? Or not throw this exception to make whole 
> Topo re-balance. 
> {code}
> 06-29 05:54:37.515 
> [Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 
> 4]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
> shade.org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete 
> on server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, 
> sessionid = 0x7a556eeee8c70ae1, negotiated timeout = 10000
> 06-29 05:54:37.515 
> [Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 
> 4]-EventThread] apache.curator.framework.state.ConnectionStateManager [INFO] 
> State change: RECONNECTED
> 06-29 05:54:37.519 [Thread-133-spout-DataKafkaSpout1466801942228-executor[154 
> 154]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
> org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on 
> server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid 
> = 0x7a556eeee8c70ae5, negotiated timeout = 10000
> 06-29 05:54:37.519 [Thread-133-spout-DataKafkaSpout1466801942228-executor[154 
> 154]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed 
> (SyncConnected)
> 06-29 05:54:37.524 [Thread-25-spout-DataKafkaSpout1466801942228-executor[156 
> 156]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
> org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on 
> server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid 
> = 0x7a556eeee8c70ae4, negotiated timeout = 10000
> 06-29 05:54:37.524 [Thread-25-spout-DataKafkaSpout1466801942228-executor[156 
> 156]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed 
> (SyncConnected)
> 06-29 05:54:37.528 
> [main-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
> shade.org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete 
> on server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, 
> sessionid = 0x7b556f0cc3a40896, negotiated timeout = 10000
> 06-29 05:54:37.528 [main-EventThread] 
> apache.curator.framework.state.ConnectionStateManager [INFO] State change: 
> RECONNECTED
> 06-29 05:54:37.528 [Thread-149-spout-DataKafkaSpout1466801942228-executor[160 
> 160]-SendThread(ip-10-9-255-26.us-west-2.compute.internal:2181)] 
> org.apache.zookeeper.ClientCnxn [INFO] Session establishment complete on 
> server ip-10-9-255-26.us-west-2.compute.internal/10.9.255.26:2181, sessionid 
> = 0x7a556eeee8c70ae3, negotiated timeout = 10000
> 06-29 05:54:37.528 [Thread-149-spout-DataKafkaSpout1466801942228-executor[160 
> 160]-EventThread] org.I0Itec.zkclient.ZkClient [INFO] zookeeper state changed 
> (SyncConnected)
> 06-29 05:54:37.536 
> [Thread-151-$spoutcoord-spout-DataKafkaSpout1466801942228-executor[4 4]] 
> org.apache.storm.util [ERROR] Async loop died!
> java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
>  KeeperErrorCode = NodeExists for /meta/712285
>       at 
> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:452)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:418)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:73)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.daemon.executor$fn__7953$fn__7966$fn__8019.invoke(executor.clj:847)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at org.apache.storm.util$async_loop$fn__625.invoke(util.clj:484) 
> [storm-core-1.0.1.jar:1.0.1]
>       at clojure.lang.AFn.run(AFn.java:22) [clojure-1.7.0.jar:?]
>       at java.lang.Thread.run(Thread.java:745) [?:1.7.0_80]
> Caused by: java.lang.RuntimeException: 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
>  KeeperErrorCode = NodeExists for /meta/712285
>       at 
> org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:119)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.trident.spout.TridentSpoutCoordinator.execute(TridentSpoutCoordinator.java:71)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.daemon.executor$fn__7953$tuple_action_fn__7955.invoke(executor.clj:728)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.daemon.executor$mk_task_receiver$fn__7874.invoke(executor.clj:461)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.disruptor$clojure_handler$reify__7390.onEvent(disruptor.clj:40)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:439)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       ... 6 more
> Caused by: 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException$NodeExistsException:
>  KeeperErrorCode = NodeExists for /meta/712285
>       at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:119)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:721)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl$11.call(CreateBuilderImpl.java:704)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:108)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.pathInForeground(CreateBuilderImpl.java:701)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.protectedPathInForeground(CreateBuilderImpl.java:477)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:467)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.shade.org.apache.curator.framework.imps.CreateBuilderImpl.forPath(CreateBuilderImpl.java:44)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.trident.topology.state.TransactionalState.forPath(TransactionalState.java:83)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.trident.topology.state.TransactionalState.createNode(TransactionalState.java:95)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.trident.topology.state.TransactionalState.setData(TransactionalState.java:115)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.trident.topology.state.RotatingTransactionalState.overrideState(RotatingTransactionalState.java:52)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.trident.spout.TridentSpoutCoordinator.execute(TridentSpoutCoordinator.java:71)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.topology.BasicBoltExecutor.execute(BasicBoltExecutor.java:50)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.daemon.executor$fn__7953$tuple_action_fn__7955.invoke(executor.clj:728)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.daemon.executor$mk_task_receiver$fn__7874.invoke(executor.clj:461)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.disruptor$clojure_handler$reify__7390.onEvent(disruptor.clj:40)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       at 
> org.apache.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:439)
>  ~[storm-core-1.0.1.jar:1.0.1]
>       ... 6 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to