[ https://issues.apache.org/jira/browse/STORM-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Joseph Evans resolved STORM-2706. ---------------------------------------- Resolution: Fixed Fix Version/s: 1.1.2 1.2.0 Thanks [~Srdo], I merged this into branch-1.x and branch-1.1.x. It didn't apply cleanly to 1.0.x so I didn't pull it in there. > Nimbus stuck in exception and does not fail fast > ------------------------------------------------ > > Key: STORM-2706 > URL: https://issues.apache.org/jira/browse/STORM-2706 > Project: Apache Storm > Issue Type: Bug > Affects Versions: 1.1.1 > Reporter: Bijan Fahimi Shemrani > Assignee: Stig Rohde Døssing > Labels: nimbus, pull-request-available > Fix For: 2.0.0, 1.2.0, 1.1.2 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > We experience a problem in nimbus which leads it to get stuck in a retry and > fail loop. When I manually restart the nimbus it works again as expected. > However, it would be great if nimbus would shut down so our monitoring can > automatically restart the nimbus. > The nimbus log. > {noformat} > 24.8.2017 15:39:1913:39:19.804 [pool-13-thread-51] ERROR > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - > Unexpected throwable while invoking! > 24.8.2017 > 15:39:19org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /storm/leader-lock > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown > Source) ~[?:?] > 24.8.2017 15:39:19 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > 24.8.2017 15:39:19 at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] > 24.8.2017 15:39:19 at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:19 at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:19 at > org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown > Source) ~[?:?] > 24.8.2017 15:39:19 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > 24.8.2017 15:39:19 at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] > 24.8.2017 15:39:19 at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:19 at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:19 at > org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getLeader(nimbus.clj:2412) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3944) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3928) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:19 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_131] > 24.8.2017 15:39:19 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_131] > 24.8.2017 15:39:19 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] > 24.8.2017 15:39:2713:39:27.205 [pool-13-thread-52] ERROR > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - > Unexpected throwable while invoking! > 24.8.2017 > 15:39:27org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException: > KeeperErrorCode = NoNode for /storm/leader-lock > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown > Source) ~[?:?] > 24.8.2017 15:39:27 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > 24.8.2017 15:39:27 at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] > 24.8.2017 15:39:27 at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:27 at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:27 at > org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown > Source) ~[?:?] > 24.8.2017 15:39:27 at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > ~[?:1.8.0_131] > 24.8.2017 15:39:27 at java.lang.reflect.Method.invoke(Method.java:498) > ~[?:1.8.0_131] > 24.8.2017 15:39:27 at > clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:27 at > clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) > ~[clojure-1.7.0.jar:?] > 24.8.2017 15:39:27 at > org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) > ~[storm-core-1.1.1.jar:1.1.1] > 24.8.2017 15:39:27 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [?:1.8.0_131] > 24.8.2017 15:39:27 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [?:1.8.0_131] > 24.8.2017 15:39:27 at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131] > 24.8.2017 15:39:2913:39:29.270 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:39:2913:39:29.270 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:39:3913:39:39.270 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:39:3913:39:39.270 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:39:4913:39:49.271 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:39:4913:39:49.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:39:5913:39:59.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:39:5913:39:59.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:40:0913:40:09.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping assignments > 24.8.2017 15:40:0913:40:09.272 [timer] INFO org.apache.storm.daemon.nimbus - > not a leader, skipping cleanup > 24.8.2017 15:40:1313:40:13.806 [timer] INFO > org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl > - Starting > 24.8.2017 15:40:1313:40:13.807 [timer] INFO > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Initiating client > connection, connectString=zookeeper:2181/storm sessionTimeout=20000 > watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@f90354 > 24.8.2017 15:40:1313:40:13.808 [timer-SendThread(10.42.174.214:2181)] INFO > org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Opening socket > connection to server 10.42.174.214/10.42.174.214:2181. Will not attempt to > authenticate using SASL (unknown error) > 24.8.2017 15:40:1313:40:13.862 [timer-SendThread(10.42.174.214:2181)] INFO > org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Socket connection > established to 10.42.174.214/10.42.174.214:2181, initiating session > 24.8.2017 15:40:1313:40:13.865 [timer-SendThread(10.42.174.214:2181)] INFO > org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Session > establishment complete on server 10.42.174.214/10.42.174.214:2181, sessionid > = 0x15e14456dc70045, negotiated timeout = 20000 > 24.8.2017 15:40:1313:40:13.910 [timer] INFO > org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Session: > 0x15e14456dc70045 closed > 24.8.2017 15:40:1313:40:13.910 [timer-EventThread] INFO > org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - EventThread shut down > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)