[ https://issues.apache.org/jira/browse/STORM-2987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ethan Li updated STORM-2987: ---------------------------- Description: We found an issue that when nimbus restarted, it can only get leadership after a few seconds (15~20s). {code:java} 2018-02-27 08:18:43.420 main o.a.s.z.LeaderElectorImp [INFO] Queued up for leader lock. 2018-02-27 08:18:43.481 main o.a.s.d.m.MetricsUtils [INFO] Using statistics reporter plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter 2018-02-27 08:18:43.483 main o.a.s.d.m.r.JmxPreparableReporter [INFO] Preparing... 2018-02-27 08:18:43.499 main o.a.s.m.StormMetricsRegistry [INFO] Started statistics report plugin... 2018-02-27 08:18:43.543 main o.a.s.m.n.Login [INFO] successfully logged in. 2018-02-27 08:18:43.551 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator 2018-02-27 08:18:43.551 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2018-02-27 08:18:43.552 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh thread started. 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT valid starting at: Tue Feb 27 08:18:43 UTC 2018 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT expires: Wed Feb 28 08:18:43 UTC 2018 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh sleeping until: Wed Feb 28 04:35:55 UTC 2018 2018-02-27 08:18:43.553 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181 sessionTimeout=60000 watcher=org.apache. curator.ConnectionState@2e185cd7 2018-02-27 08:18:43.559 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema 2018-02-27 08:18:43.560 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mechanism. 2018-02-27 08:18:43.561 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74blue-gw.blue.ygrid.yahoo.co m/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client' 2018-02-27 08:18:43.562 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue-gw.blue.ygrid.yahoo.com/1 0.215.68.156:2181, initiating session 2018-02-27 08:18:43.565 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server openqe74blue-gw.blue.ygrid.yah oo.com/10.215.68.156:2181, sessionid = 0x161d5f1ae970099, negotiated timeout = 40000 2018-02-27 08:18:43.565 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2018-02-27 08:18:43.605 Curator-Framework-0 o.a.c.f.i.CuratorFrameworkImpl [INFO] backgroundOperationsLoop exiting 2018-02-27 08:18:43.625 main o.a.z.ZooKeeper [INFO] Session: 0x161d5f1ae970099 closed 2018-02-27 08:18:43.625 main-EventThread o.a.z.ClientCnxn [INFO] EventThread shut down 2018-02-27 08:18:43.626 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator 2018-02-27 08:18:43.626 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2018-02-27 08:18:43.635 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181/storm_ystormQE_CI sessionTimeout=60000 w atcher=org.apache.curator.ConnectionState@46cc127b 2018-02-27 08:18:43.654 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mechanism. 2018-02-27 08:18:43.660 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74blue-gw.blue.ygrid.yahoo.co m/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client' 2018-02-27 08:18:43.663 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema 2018-02-27 08:18:43.663 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue-gw.blue.ygrid.yahoo.com/1 0.215.68.156:2181, initiating session 2018-02-27 08:18:43.666 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server openqe74blue-gw.blue.ygrid.yah oo.com/10.215.68.156:2181, sessionid = 0x161d5f1ae97009a, negotiated timeout = 40000 2018-02-27 08:18:43.669 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2018-02-27 08:18:43.790 main o.a.s.d.n.Nimbus [INFO] Starting nimbus server for storm version '2.0.0.y' 2018-02-27 08:18:44.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping assignments 2018-02-27 08:18:44.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping cleanup 2018-02-27 08:18:44.300 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: blob-5-1518767144-stormcode.ser 2018-02-27 08:18:44.301 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: TestZkErrorNodesHaveCorrectAcls-3-1519540302-stormcode.ser 2018-02-27 08:18:44.302 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: logviewer-ui-groups-test-1-1518940914-stormcode.ser 2018-02-27 08:18:44.303 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: blob-5-1518800831-stormcode.ser 2018-02-27 08:18:44.304 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: blob-5-1518767144-stormconf.ser 2018-02-27 08:18:44.306 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: logviewer-ui-groups-test-1-1518940914-stormconf.ser 2018-02-27 08:18:44.307 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: TestZkErrorNodesHaveCorrectAcls-3-1519540302-stormconf.ser 2018-02-27 08:18:44.308 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: blob-5-1518800831-stormconf.ser 2018-02-27 08:18:44.367 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping credential renewal. 2018-02-27 08:18:54.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping assignments 2018-02-27 08:18:54.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping cleanup 2018-02-27 08:18:59.367 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping credential renewal. 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-blobs [] local-topology-blobs [] diff-topology-blobs [] 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-dependencies [] local-blobs [] diff-topology-dependencies [] 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] Accepting leadership, all active topologies and corresponding dependencies found locally. 2018-02-27 08:19:04.754 timer o.a.s.d.n.Nimbus [INFO] Scheduling took 442 ms for 0 topologies {code} This can be re-produced by the following steps: {code:java} 1. restart pacemaker; 2. before pacemaker is up, restart nimbus{code} When we restart nimbus process, it runs ShutDownHooks and stuck on timer.close(). [https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L4227] The timer is not able to close because it's waiting for doCleanup() to stop. However, the interruptedException is caught and ate in : [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClient.java#L180-L192] was: We found an issue that when nimbus restarted, it can only get leadership after a few seconds (15~20s). {code:java} 2018-02-27 08:18:43.420 main o.a.s.z.LeaderElectorImp [INFO] Queued up for leader lock. 2018-02-27 08:18:43.481 main o.a.s.d.m.MetricsUtils [INFO] Using statistics reporter plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter 2018-02-27 08:18:43.483 main o.a.s.d.m.r.JmxPreparableReporter [INFO] Preparing... 2018-02-27 08:18:43.499 main o.a.s.m.StormMetricsRegistry [INFO] Started statistics report plugin... 2018-02-27 08:18:43.543 main o.a.s.m.n.Login [INFO] successfully logged in. 2018-02-27 08:18:43.551 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator 2018-02-27 08:18:43.551 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2018-02-27 08:18:43.552 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh thread started. 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT valid starting at: Tue Feb 27 08:18:43 UTC 2018 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT expires: Wed Feb 28 08:18:43 UTC 2018 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh sleeping until: Wed Feb 28 04:35:55 UTC 2018 2018-02-27 08:18:43.553 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181 sessionTimeout=60000 watcher=org.apache. curator.ConnectionState@2e185cd7 2018-02-27 08:18:43.559 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema 2018-02-27 08:18:43.560 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mechanism. 2018-02-27 08:18:43.561 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74blue-gw.blue.ygrid.yahoo.co m/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client' 2018-02-27 08:18:43.562 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue-gw.blue.ygrid.yahoo.com/1 0.215.68.156:2181, initiating session 2018-02-27 08:18:43.565 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server openqe74blue-gw.blue.ygrid.yah oo.com/10.215.68.156:2181, sessionid = 0x161d5f1ae970099, negotiated timeout = 40000 2018-02-27 08:18:43.565 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2018-02-27 08:18:43.605 Curator-Framework-0 o.a.c.f.i.CuratorFrameworkImpl [INFO] backgroundOperationsLoop exiting 2018-02-27 08:18:43.625 main o.a.z.ZooKeeper [INFO] Session: 0x161d5f1ae970099 closed 2018-02-27 08:18:43.625 main-EventThread o.a.z.ClientCnxn [INFO] EventThread shut down 2018-02-27 08:18:43.626 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator 2018-02-27 08:18:43.626 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2018-02-27 08:18:43.635 main o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181/storm_ystormQE_CI sessionTimeout=60000 w atcher=org.apache.curator.ConnectionState@46cc127b 2018-02-27 08:18:43.654 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mechanism. 2018-02-27 08:18:43.660 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Opening socket connection to server openqe74blue-gw.blue.ygrid.yahoo.co m/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context section 'Client' 2018-02-27 08:18:43.663 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default schema 2018-02-27 08:18:43.663 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Socket connection established to openqe74blue-gw.blue.ygrid.yahoo.com/1 0.215.68.156:2181, initiating session 2018-02-27 08:18:43.666 main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn [INFO] Session establishment complete on server openqe74blue-gw.blue.ygrid.yah oo.com/10.215.68.156:2181, sessionid = 0x161d5f1ae97009a, negotiated timeout = 40000 2018-02-27 08:18:43.669 main-EventThread o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2018-02-27 08:18:43.790 main o.a.s.d.n.Nimbus [INFO] Starting nimbus server for storm version '2.0.0.y' 2018-02-27 08:18:44.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping assignments 2018-02-27 08:18:44.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping cleanup 2018-02-27 08:18:44.300 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: blob-5-1518767144-stormcode.ser 2018-02-27 08:18:44.301 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: TestZkErrorNodesHaveCorrectAcls-3-1519540302-stormcode.ser 2018-02-27 08:18:44.302 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: logviewer-ui-groups-test-1-1518940914-stormcode.ser 2018-02-27 08:18:44.303 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: blob-5-1518800831-stormcode.ser 2018-02-27 08:18:44.304 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: blob-5-1518767144-stormconf.ser 2018-02-27 08:18:44.306 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: logviewer-ui-groups-test-1-1518940914-stormconf.ser 2018-02-27 08:18:44.307 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: TestZkErrorNodesHaveCorrectAcls-3-1519540302-stormconf.ser 2018-02-27 08:18:44.308 timer o.a.s.b.BlobStoreUtils [ERROR] Could not download the blob with key: blob-5-1518800831-stormconf.ser 2018-02-27 08:18:44.367 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping credential renewal. 2018-02-27 08:18:54.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping assignments 2018-02-27 08:18:54.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping cleanup 2018-02-27 08:18:59.367 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping credential renewal. 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-blobs [] local-topology-blobs [] diff-topology-blobs [] 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] active-topology-dependencies [] local-blobs [] diff-topology-dependencies [] 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] Accepting leadership, all active topologies and corresponding dependencies found locally. 2018-02-27 08:19:04.754 timer o.a.s.d.n.Nimbus [INFO] Scheduling took 442 ms for 0 topologies {code} This can be re-produced by the following steps: {code:java} 1. restart pacemaker; 2. before pacemaker is up, restart nimbus{code} So when we restart nimbus process, it runs ShutDownHooks and stuck on timer.close(). https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L4227 The timer is not able to close because it's waiting for doCleanup() to stop. However, the interruptedException is caught and ate in : [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClient.java#L180-L192] > PaceMakerStateStorage should deal with InterruptedException correctly > --------------------------------------------------------------------- > > Key: STORM-2987 > URL: https://issues.apache.org/jira/browse/STORM-2987 > Project: Apache Storm > Issue Type: Bug > Reporter: Ethan Li > Assignee: Ethan Li > Priority: Major > > We found an issue that when nimbus restarted, it can only get leadership > after a few seconds (15~20s). > > {code:java} > 2018-02-27 08:18:43.420 main o.a.s.z.LeaderElectorImp [INFO] Queued up for > leader lock. > 2018-02-27 08:18:43.481 main o.a.s.d.m.MetricsUtils [INFO] Using statistics > reporter > plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableReporter > 2018-02-27 08:18:43.483 main o.a.s.d.m.r.JmxPreparableReporter [INFO] > Preparing... > 2018-02-27 08:18:43.499 main o.a.s.m.StormMetricsRegistry [INFO] Started > statistics report plugin... > 2018-02-27 08:18:43.543 main o.a.s.m.n.Login [INFO] successfully logged in. > 2018-02-27 08:18:43.551 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator > 2018-02-27 08:18:43.551 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting > 2018-02-27 08:18:43.552 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh thread > started. > 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT valid starting > at: Tue Feb 27 08:18:43 UTC 2018 > 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT expires: > Wed Feb 28 08:18:43 UTC 2018 > 2018-02-27 08:18:43.553 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh > sleeping until: Wed Feb 28 04:35:55 UTC 2018 > 2018-02-27 08:18:43.553 main o.a.z.ZooKeeper [INFO] Initiating client > connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181 > sessionTimeout=60000 watcher=org.apache. > curator.ConnectionState@2e185cd7 > 2018-02-27 08:18:43.559 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default > schema > 2018-02-27 08:18:43.560 > main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) > o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mechanism. > 2018-02-27 08:18:43.561 > main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn > [INFO] Opening socket connection to server openqe74blue-gw.blue.ygrid.yahoo.co > m/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context > section 'Client' > 2018-02-27 08:18:43.562 > main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn > [INFO] Socket connection established to openqe74blue-gw.blue.ygrid.yahoo.com/1 > 0.215.68.156:2181, initiating session > 2018-02-27 08:18:43.565 > main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn > [INFO] Session establishment complete on server openqe74blue-gw.blue.ygrid.yah > oo.com/10.215.68.156:2181, sessionid = 0x161d5f1ae970099, negotiated timeout > = 40000 > 2018-02-27 08:18:43.565 main-EventThread o.a.c.f.s.ConnectionStateManager > [INFO] State change: CONNECTED > 2018-02-27 08:18:43.605 Curator-Framework-0 o.a.c.f.i.CuratorFrameworkImpl > [INFO] backgroundOperationsLoop exiting > 2018-02-27 08:18:43.625 main o.a.z.ZooKeeper [INFO] Session: > 0x161d5f1ae970099 closed > 2018-02-27 08:18:43.625 main-EventThread o.a.z.ClientCnxn [INFO] EventThread > shut down > 2018-02-27 08:18:43.626 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator > 2018-02-27 08:18:43.626 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting > 2018-02-27 08:18:43.635 main o.a.z.ZooKeeper [INFO] Initiating client > connection, > connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181/storm_ystormQE_CI > sessionTimeout=60000 w > atcher=org.apache.curator.ConnectionState@46cc127b > 2018-02-27 08:18:43.654 > main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) > o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mechanism. > 2018-02-27 08:18:43.660 > main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn > [INFO] Opening socket connection to server openqe74blue-gw.blue.ygrid.yahoo.co > m/10.215.68.156:2181. Will attempt to SASL-authenticate using Login Context > section 'Client' > 2018-02-27 08:18:43.663 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default > schema > 2018-02-27 08:18:43.663 > main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn > [INFO] Socket connection established to openqe74blue-gw.blue.ygrid.yahoo.com/1 > 0.215.68.156:2181, initiating session > 2018-02-27 08:18:43.666 > main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn > [INFO] Session establishment complete on server openqe74blue-gw.blue.ygrid.yah > oo.com/10.215.68.156:2181, sessionid = 0x161d5f1ae97009a, negotiated timeout > = 40000 > 2018-02-27 08:18:43.669 main-EventThread o.a.c.f.s.ConnectionStateManager > [INFO] State change: CONNECTED > 2018-02-27 08:18:43.790 main o.a.s.d.n.Nimbus [INFO] Starting nimbus server > for storm version '2.0.0.y' > 2018-02-27 08:18:44.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping > assignments > 2018-02-27 08:18:44.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping > cleanup > 2018-02-27 08:18:44.300 timer o.a.s.b.BlobStoreUtils [ERROR] Could not > download the blob with key: blob-5-1518767144-stormcode.ser > 2018-02-27 08:18:44.301 timer o.a.s.b.BlobStoreUtils [ERROR] Could not > download the blob with key: > TestZkErrorNodesHaveCorrectAcls-3-1519540302-stormcode.ser > 2018-02-27 08:18:44.302 timer o.a.s.b.BlobStoreUtils [ERROR] Could not > download the blob with key: > logviewer-ui-groups-test-1-1518940914-stormcode.ser > 2018-02-27 08:18:44.303 timer o.a.s.b.BlobStoreUtils [ERROR] Could not > download the blob with key: blob-5-1518800831-stormcode.ser > 2018-02-27 08:18:44.304 timer o.a.s.b.BlobStoreUtils [ERROR] Could not > download the blob with key: blob-5-1518767144-stormconf.ser > 2018-02-27 08:18:44.306 timer o.a.s.b.BlobStoreUtils [ERROR] Could not > download the blob with key: > logviewer-ui-groups-test-1-1518940914-stormconf.ser > 2018-02-27 08:18:44.307 timer o.a.s.b.BlobStoreUtils [ERROR] Could not > download the blob with key: > TestZkErrorNodesHaveCorrectAcls-3-1519540302-stormconf.ser > 2018-02-27 08:18:44.308 timer o.a.s.b.BlobStoreUtils [ERROR] Could not > download the blob with key: blob-5-1518800831-stormconf.ser > 2018-02-27 08:18:44.367 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping > credential renewal. > 2018-02-27 08:18:54.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping > assignments > 2018-02-27 08:18:54.274 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping > cleanup > 2018-02-27 08:18:59.367 timer o.a.s.d.n.Nimbus [INFO] not a leader, skipping > credential renewal. > 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] > active-topology-blobs [] local-topology-blobs [] diff-topology-blobs [] > 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] > active-topology-dependencies [] local-blobs [] diff-topology-dependencies [] > 2018-02-27 08:19:02.059 main-EventThread o.a.s.z.Zookeeper [INFO] Accepting > leadership, all active topologies and corresponding dependencies found > locally. > 2018-02-27 08:19:04.754 timer o.a.s.d.n.Nimbus [INFO] Scheduling took 442 ms > for 0 topologies > {code} > > This can be re-produced by the following steps: > {code:java} > 1. restart pacemaker; > 2. before pacemaker is up, restart nimbus{code} > When we restart nimbus process, it runs ShutDownHooks and stuck on > timer.close(). > [https://github.com/apache/storm/blob/master/storm-server/src/main/java/org/apache/storm/daemon/nimbus/Nimbus.java#L4227] > The timer is not able to close because it's waiting for doCleanup() to stop. > However, the interruptedException is caught and ate in : > [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClient.java#L180-L192] > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)