Jungtaek Lim created STORM-1977:
-----------------------------------
Summary: Leader Nimbus crashes with getClusterInfo when it doesn't
have one or more replicated topology codes
Key: STORM-1977
URL: https://issues.apache.org/jira/browse/STORM-1977
Project: Apache Storm
Issue Type: Bug
Components: storm-core
Affects Versions: 1.0.0, 1.0.1
Reporter: Jungtaek Lim
Assignee: Jungtaek Lim
Priority: Critical
While investigating STORM-1976, I found that there're cases for nimbus to not
having topology codes.
Before BlobStore, only nimbuses which is having all topology codes can gain
leadership, otherwise they give up leadership immediately. While introducing
BlobStore, this logic is removed.
I don't know it's intended or not, but it incurs one of nimbus to gain
leadership which doesn't have replicated topology code, and the nimbus will be
crashed when getClusterInfo is requested.
Easiest way to reproduce is:
1. Launch Nimbus 1 (leader)
2. Run topology
3. Kill Nimbus 1
4. Launch Nimbus 2 from different node
5. Nimbus 2 gains leadership
6. getClusterInfo is requested to Nimbus 2, and Nimbus 2 gets crashed
Log
{code}
2016-07-17 08:47:48.378 o.a.s.b.FileBlobStoreImpl [INFO] Creating new blob
store based in /grid/0/hadoop/storm/blobs
...
2016-07-17 08:47:48.619 o.a.s.zookeeper [INFO] Queued up for leader lock.
2016-07-17 08:47:48.651 o.a.s.zookeeper [INFO]
jlim-ams-storm-ha-1.openstacklocal gained leadership
...
2016-07-17 08:47:48.833 o.a.s.d.nimbus [INFO] Starting nimbus server for storm
version '1.1.1-SNAPSHOT'
2016-07-17 08:47:49.295 o.a.s.t.ProcessFunction [ERROR] Internal error
processing getClusterInfo
KeyNotFoundException(msg:production-topology-2-1468745167-stormcode.ser)
at
org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:149)
at
org.apache.storm.blobstore.LocalFsBlobStore.getBlobReplication(LocalFsBlobStore.java:268)
...
at
org.apache.storm.daemon.nimbus$get_blob_replication_count.invoke(nimbus.clj:498)
at
org.apache.storm.daemon.nimbus$get_cluster_info$iter__9520__9524$fn__9525.invoke(nimbus.clj:1427)
...
at
org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1401)
at
org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__9612.getClusterInfo(nimbus.clj:1838)
at
org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3724)
at
org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3708)
at
org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39)
...
2016-07-17 08:47:49.397 o.a.s.b.BlobStoreUtils [ERROR] Could not download blob
with keyproduction-topology-2-1468745167-stormconf.ser
2016-07-17 08:47:49.400 o.a.s.b.BlobStoreUtils [ERROR] Could not update the
blob with keyproduction-topology-2-1468745167-stormconf.ser
2016-07-17 08:47:49.402 o.a.s.d.nimbus [ERROR] Error when processing event
KeyNotFoundException(msg:production-topology-2-1468745167-stormconf.ser)
at
org.apache.storm.blobstore.LocalFsBlobStore.getStoredBlobMeta(LocalFsBlobStore.java:149)
at
org.apache.storm.blobstore.LocalFsBlobStore.getBlob(LocalFsBlobStore.java:239)
at org.apache.storm.blobstore.BlobStore.readBlobTo(BlobStore.java:271)
at org.apache.storm.blobstore.BlobStore.readBlob(BlobStore.java:300)
...
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
at
org.apache.storm.daemon.nimbus$read_storm_conf_as_nimbus.invoke(nimbus.clj:548)
at
org.apache.storm.daemon.nimbus$read_topology_details.invoke(nimbus.clj:555)
at
org.apache.storm.daemon.nimbus$mk_assignments$iter__9205__9209$fn__9210.invoke(nimbus.clj:912)
...
at
org.apache.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:911)
at clojure.lang.RestFn.invoke(RestFn.java:410)
at
org.apache.storm.daemon.nimbus$fn__9769$exec_fn__1363__auto____9770$fn__9781$fn__9782.invoke(nimbus.clj:2216)
at
org.apache.storm.daemon.nimbus$fn__9769$exec_fn__1363__auto____9770$fn__9781.invoke(nimbus.clj:2215)
at
org.apache.storm.timer$schedule_recurring$this__1732.invoke(timer.clj:105)
at
org.apache.storm.timer$mk_timer$fn__1715$fn__1716.invoke(timer.clj:50)
at org.apache.storm.timer$mk_timer$fn__1715.invoke(timer.clj:42)
...
2016-07-17 08:47:49.408 o.a.s.util [ERROR] Halting process: ("Error when
processing an event")
java.lang.RuntimeException: ("Error when processing an event")
at org.apache.storm.util$exit_process_BANG_.doInvoke(util.clj:341)
at clojure.lang.RestFn.invoke(RestFn.java:423)
at
org.apache.storm.daemon.nimbus$nimbus_data$fn__8727.invoke(nimbus.clj:205)
at
org.apache.storm.timer$mk_timer$fn__1715$fn__1716.invoke(timer.clj:71)
at org.apache.storm.timer$mk_timer$fn__1715.invoke(timer.clj:42)
at clojure.lang.AFn.run(AFn.java:22)
at java.lang.Thread.run(Thread.java:745)
2016-07-17 08:47:49.410 o.a.s.d.nimbus [INFO] Shutting down master
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)