[ 
https://issues.apache.org/jira/browse/STORM-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-3012.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0

Thanks [~ethanli],

 

I merged this into master.

> Nimbus will crash if pacemaker is restarted
> -------------------------------------------
>
>                 Key: STORM-3012
>                 URL: https://issues.apache.org/jira/browse/STORM-3012
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: Ethan Li
>            Assignee: Ethan Li
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Below is the nimbus.log when I restarted pacemaker. Nimbus crashed because of 
> NPE.
>  
>  
> {code:java}
> 2018-03-26 21:39:18.404 main o.a.s.z.LeaderElectorImp [INFO] Queued up for 
> leader lock.
> 2018-03-26 21:39:18.458 main o.a.s.d.m.MetricsUtils [INFO] Using statistics 
> reporter plugin:org.apache.storm.daemon.metrics.reporters.JmxPreparableRepor
> ter
> 2018-03-26 21:39:18.461 main o.a.s.d.m.r.JmxPreparableReporter [INFO] 
> Preparing...
> 2018-03-26 21:39:18.527 main o.a.s.m.StormMetricsRegistry [INFO] Started 
> statistics report plugin...
> 2018-03-26 21:39:18.710 main o.a.s.m.n.Login [INFO] successfully logged in.
> 2018-03-26 21:39:18.738 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh thread 
> started.
> 2018-03-26 21:39:18.739 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
> 2018-03-26 21:39:18.739 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT valid starting 
> at:        Mon Mar 26 21:39:18 UTC 2018
> 2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT expires:       
>            Tue Mar 27 21:39:18 UTC 2018
> 2018-03-26 21:39:18.747 Refresh-TGT o.a.s.m.n.Login [INFO] TGT refresh 
> sleeping until: Tue Mar 27 17:39:22 UTC 2018
> 2018-03-26 21:39:18.756 main o.a.z.ZooKeeper [INFO] Initiating client 
> connection, connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181 
> sessionTimeout
> =60000 watcher=org.apache.curator.ConnectionState@148c7c4b
> 2018-03-26 21:39:18.807 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default 
> schema
> 2018-03-26 21:39:18.814 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) 
> o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mec
> hanism.
> 2018-03-26 21:39:18.815 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Opening socket connection to server openqe74b
> lue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181. Will attempt to 
> SASL-authenticate using Login Context section 'Client'
> 2018-03-26 21:39:18.816 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Socket connection established to openqe74blue
> -gw.blue.ygrid.yahoo.com/10.215.68.156:2181, initiating session
> 2018-03-26 21:39:18.817 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Session establishment complete on server open
> qe74blue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, sessionid = 
> 0x1624f6d49dd0cdd, negotiated timeout = 40000
> 2018-03-26 21:39:18.818 main-EventThread o.a.c.f.s.ConnectionStateManager 
> [INFO] State change: CONNECTED
> 2018-03-26 21:39:18.839 Curator-Framework-0 o.a.c.f.i.CuratorFrameworkImpl 
> [INFO] backgroundOperationsLoop exiting
> 2018-03-26 21:39:18.841 main o.a.z.ZooKeeper [INFO] Session: 
> 0x1624f6d49dd0cdd closed
> 2018-03-26 21:39:18.842 main-EventThread o.a.z.ClientCnxn [INFO] EventThread 
> shut down
> 2018-03-26 21:39:18.844 main o.a.s.z.ClientZookeeper [INFO] Staring ZK Curator
> 2018-03-26 21:39:18.844 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting
> 2018-03-26 21:39:18.875 main o.a.z.ZooKeeper [INFO] Initiating client 
> connection, 
> connectString=openqe74blue-gw.blue.ygrid.yahoo.com:2181/storm_ystormQE
> _CI sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@211febf3
> 2018-03-26 21:39:18.908 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) 
> o.a.z.c.ZooKeeperSaslClient [INFO] Client will use GSSAPI as SASL mec
> hanism.
> 2018-03-26 21:39:18.909 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Opening socket connection to server openqe74b
> lue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181. Will attempt to 
> SASL-authenticate using Login Context section 'Client'
> 2018-03-26 21:39:18.910 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Socket connection established to openqe74blue
> -gw.blue.ygrid.yahoo.com/10.215.68.156:2181, initiating session
> 2018-03-26 21:39:18.911 
> main-SendThread(openqe74blue-gw.blue.ygrid.yahoo.com:2181) o.a.z.ClientCnxn 
> [INFO] Session establishment complete on server open
> qe74blue-gw.blue.ygrid.yahoo.com/10.215.68.156:2181, sessionid = 
> 0x1624f6d49dd0cde, negotiated timeout = 40000
> 2018-03-26 21:39:18.920 main o.a.c.f.i.CuratorFrameworkImpl [INFO] Default 
> schema
> 2018-03-26 21:39:18.923 main-EventThread o.a.c.f.s.ConnectionStateManager 
> [INFO] State change: CONNECTED
> 2018-03-26 21:39:18.986 main o.a.s.d.n.Nimbus [INFO] Starting nimbus server 
> for storm version '2.0.0.y'
> 2018-03-26 21:39:19.931 main-EventThread o.a.s.z.Zookeeper [INFO] 
> active-topology-blobs [] local-topology-blobs [] diff-topology-blobs []
> 2018-03-26 21:39:19.932 main-EventThread o.a.s.z.Zookeeper [INFO] 
> active-topology-dependencies [] local-blobs [] diff-topology-dependencies []
> 2018-03-26 21:39:19.932 main-EventThread o.a.s.z.Zookeeper [INFO] Accepting 
> leadership, all active topologies and corresponding dependencies found local
> ly.
> 2018-03-26 21:39:20.636 timer o.a.s.d.n.Nimbus [INFO] Scheduling took 1381 ms 
> for 0 topologies
> 2018-03-26 21:39:20.901 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: open
> qe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:20.901 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 101ms (NOT 
> MAX)
> 2018-03-26 21:39:21.003 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: open
> qe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.003 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 102ms (NOT 
> MAX)
> 2018-03-26 21:39:21.106 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.106 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 106ms (NOT 
> MAX)
> 2018-03-26 21:39:21.214 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.214 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 115ms (NOT 
> MAX)
> 2018-03-26 21:39:21.331 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.331 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 129ms (NOT 
> MAX)
> 2018-03-26 21:39:21.462 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.462 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 162ms (NOT 
> MAX)
> 2018-03-26 21:39:21.626 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.626 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 176ms (NOT 
> MAX)
> 2018-03-26 21:39:21.807 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:21.807 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 319ms (NOT 
> MAX)
> 2018-03-26 21:39:21.888 timer o.a.s.p.PacemakerClient [ERROR] error 
> attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting 
> for channel ready.
>         at 
> org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:22.128 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:22.128 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 603ms (NOT 
> MAX)
> 2018-03-26 21:39:22.733 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:22.733 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 868ms (NOT 
> MAX)
> 2018-03-26 21:39:22.888 timer o.a.s.p.PacemakerClient [ERROR] error 
> attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting 
> for channel ready.
>         at 
> org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:23.603 client-boss-1 o.a.s.p.PacemakerClientHandler [WARN] 
> Connection to pacemaker failed. Trying to reconnect Connection refused: 
> openqe74blue-n1.blue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:23.603 client-boss-1 
> o.a.s.u.StormBoundedExponentialBackoffRetry [WARN] WILL SLEEP FOR 1494ms (NOT 
> MAX)
> 2018-03-26 21:39:23.888 timer o.a.s.p.PacemakerClient [ERROR] error 
> attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting 
> for channel ready.
>         at 
> org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:24.889 timer o.a.s.p.PacemakerClient [ERROR] error 
> attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting 
> for channel ready.
>         at 
> org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler 
> [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b  
>       at 
> org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler 
> [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b
> lue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:25.107 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient 
> [INFO] Creating Kerberos Client.
> 2018-03-26 21:39:25.116 client-worker-4 o.a.s.m.n.Login [INFO] successfully 
> logged in.
> 2018-03-26 21:39:25.121 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient 
> [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@116ce525
> 2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslClientHandler 
> [INFO] Connection established from /10.215.76.240:37614 to 
> openqe74blue-n2.blue.ygrid.yahoo.com/10.215.76.243:6699
> 2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient 
> [INFO] Creating Kerberos Client.
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:24.889 timer o.a.s.p.PacemakerClient [ERROR] error 
> attempting to write to a channel {}
> org.apache.storm.pacemaker.PacemakerConnectionException: Timed out waiting 
> for channel ready.
>         at 
> org.apache.storm.pacemaker.PacemakerClient.waitUntilReady(PacemakerClient.java:213)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:182) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClient.send(PacemakerClient.java:197) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.pacemaker.PacemakerClientPool.sendAll(PacemakerClientPool.java:65)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:193)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:25.100 client-worker-4 o.a.s.m.n.KerberosSaslClientHandler 
> [INFO] Connection established from /10.215.76.240:36922 to openqe74blue-n1.b
> lue.ygrid.yahoo.com/10.215.76.240:6699
> 2018-03-26 21:39:25.107 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient 
> [INFO] Creating Kerberos Client.
> 2018-03-26 21:39:25.116 client-worker-4 o.a.s.m.n.Login [INFO] successfully 
> logged in.
> 2018-03-26 21:39:25.121 client-worker-4 o.a.s.m.n.KerberosSaslNettyClient 
> [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@116ce525
> 2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslClientHandler 
> [INFO] Connection established from /10.215.76.240:37614 to openqe74blue-n2.b
> lue.ygrid.yahoo.com/10.215.76.243:6699
> 2018-03-26 21:39:25.753 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient 
> [INFO] Creating Kerberos Client.
> 2018-03-26 21:39:25.763 client-worker-1 o.a.s.m.n.Login [INFO] successfully 
> logged in.
> 2018-03-26 21:39:25.765 client-worker-1 o.a.s.m.n.KerberosSaslNettyClient 
> [INFO] Got Client: com.sun.security.sasl.gsskerb.GssKrb5Client@493cfe64
> 2018-03-26 21:39:26.596 timer o.a.s.d.n.Nimbus [ERROR] Error while processing 
> event
> java.lang.RuntimeException: java.lang.NullPointerException
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2508)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$1.run(StormTimer.java:207) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:81) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.storm.cluster.PaceMakerStateStorage.get_worker_hb_children(PaceMakerStateStorage.java:195)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.cluster.StormClusterStateImpl.heartbeatStorms(StormClusterStateImpl.java:408)
>  ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.topoIdsToClean(Nimbus.java:765) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at org.apache.storm.daemon.nimbus.Nimbus.doCleanup(Nimbus.java:2148) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$launchServer$36(Nimbus.java:2506)
>  ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         ... 2 more
> 2018-03-26 21:39:26.596 timer o.a.s.u.Utils [ERROR] Halting process: Error 
> while processing event
> java.lang.RuntimeException: Halting process: Error while processing event
>         at org.apache.storm.utils.Utils.exitProcess(Utils.java:469) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.daemon.nimbus.Nimbus.lambda$new$23(Nimbus.java:1154) 
> ~[storm-server-2.0.0.y.jar:2.0.0.y]
>         at 
> org.apache.storm.StormTimer$StormTimerTask.run(StormTimer.java:106) 
> ~[storm-client-2.0.0.y.jar:2.0.0.y]
> 2018-03-26 21:39:26.600 Thread-16 o.a.s.u.Utils [INFO] Halting after 5 seconds
> 2018-03-26 21:39:26.606 Thread-15 o.a.s.d.n.Nimbus [INFO] Shutting down master
> 2018-03-26 21:39:31.600 Thread-16 o.a.s.u.Utils [WARN] Forcing Halt...
> {code}
>  
>  
> This is because when 
> [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClient.java#L195-L198]
>  happens,
>  
> {code:java}
> HBMessage ret = messages[next];
> if(ret == null) {
> // This can happen if we lost the connection and subsequently reconnected or 
> timed out.
> send(m);
> }
> messages[next] = null;
> LOG.debug("Got Response: {}", ret);
> return ret;
> {code}
> it returns null result. And the null result is inserted into 
> [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/pacemaker/PacemakerClientPool.java#L65-L66]
> {code:java}
> for(String s : servers) {
> HBMessage response = getClientForServer(s).send(m);
> responses.add(response);
> }
> {code}
> which leads to 
> [https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/cluster/PaceMakerStateStorage.java#L195]
>  
> {code:java}
> for(HBMessage response : responses) {
> if (response.get_type() != 
> HBServerMessageType.GET_ALL_NODES_FOR_PATH_RESPONSE) {
> LOG.error("get_worker_hb_children: Invalid Response Type");
> continue;
> }
> if(response.get_data().get_nodes().get_pulseIds() != null) {
> retSet.addAll(response.get_data().get_nodes().get_pulseIds());
> }
> }
> {code}
>  
> and this is where NPE happens 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to