[
https://issues.apache.org/jira/browse/STORM-99?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14287649#comment-14287649
]
Robert Joseph Evans commented on STORM-99:
------------------------------------------
STORM-155 was fixed as a part of 0.9.3, so 0.9.2 still has the issue in it.
STORM-155 also only reduced the window in which the race condition causes the
supervisor to crash because it is in the middle of downloading scheduling
changes while nimbus is in the middle of uploading them. There is still the
possibility that if nimbus crashes in the middle of uploading the changes, or
for some other reason nimbus takes a really long time to upload all the changes
the supervisor will still crash after the retries are exhausted.
> Multitple topologies assigned to one port
> -----------------------------------------
>
> Key: STORM-99
> URL: https://issues.apache.org/jira/browse/STORM-99
> Project: Apache Storm
> Issue Type: Bug
> Reporter: James Xu
> Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/586
> When I submitted a lot of topologies(about 15) to the storm cluster, one
> supervisor throw a runtime exception. I believe that the exception should not
> be thrown in the normal. The version of the storm is 0.8.1
> I tried to reproduce this exception, but failed and after review the
> nimbus.clj & supervisor.clj, I found nothing. Is it a known bug?
> Stack Info:
> 2013-06-06 14:55:00 supervisor [INFO] Shut down
> 212aa36c-81a0-4d4d-8104-759d9f128669:f66a0118-7267-4460-9f35-58435c92dc10
> 2013-06-06 14:55:00 supervisor [INFO] Shutting down and clearing state for id
> 8db35c17-9e69-49db-86c3-f7dc90fa42be. Current supervisor time: 1370501700.
> State: :disallowed, Heartbeat:
> #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1370501699,
> :storm-id "case_10-537-1370488327", :executors #{[7 7] [14 14]}, :port 6708}
> 2013-06-06 14:55:00 supervisor [INFO] Shutting down
> 212aa36c-81a0-4d4d-8104-759d9f128669:8db35c17-9e69-49db-86c3-f7dc90fa42be
> 2013-06-06 14:55:00 supervisor [INFO] Shut down
> 212aa36c-81a0-4d4d-8104-759d9f128669:8db35c17-9e69-49db-86c3-f7dc90fa42be
> 2013-06-06 14:55:03 event [ERROR] Error when processing event
> java.lang.RuntimeException: Should not have multiple topologies assigned to
> one port
> at
> backtype.storm.daemon.supervisor$read_assignments$fn__4510.doInvoke(supervisor.clj:45)
> at clojure.lang.RestFn.invoke(RestFn.java:421)
> at clojure.core$merge_with$merge_entry__4159.invoke(core.clj:2645)
> at clojure.core$reduce1.invoke(core.clj:880)
> at clojure.core$merge_with$merge2__4161.invoke(core.clj:2648)
> at clojure.core$reduce1.invoke(core.clj:880)
> at clojure.core$reduce1.invoke(core.clj:871)
> at clojure.core$merge_with.doInvoke(core.clj:2649)
> at clojure.lang.RestFn.applyTo(RestFn.java:139)
> at clojure.core$apply.invoke(core.clj:603)
> at backtype.storm.daemon.supervisor$read_assignments.invoke(supervisor.clj:48)
> at
> backtype.storm.daemon.supervisor$mk_synchronize_supervisor$this__4692.invoke(supervisor.clj:270)
> at backtype.storm.event$event_manager$fn__2484.invoke(event.clj:24)
> at clojure.lang.AFn.run(AFn.java:24)
> at java.lang.Thread.run(Thread.java:679)
> 2013-06-06 14:55:03 util [INFO] Halting process: ("Error when processing an
> event")
> nimbus log within the crash time
> 2013-06-06 14:54:44 nimbus [INFO] Delaying event :remove for 0 secs for
> case_9-536-1370488243
> 2013-06-06 14:54:44 nimbus [INFO] Updated case_9-536-1370488243 with status
> {:type :killed, :kill-time-secs 0}
> 2013-06-06 14:54:45 nimbus [INFO] Killing topology: case_9-536-1370488243
> 2013-06-06 14:54:52 EvenScheduler [INFO] Available slots:
> (["212aa36c-81a0-4d4d-8104-759d9f128669" 6705]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6706]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6709]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6710]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6711])
> 2013-06-06 14:54:52 nimbus [INFO] Setting new assignment for topology id
> case_23-527-1370478607:
> #backtype.storm.daemon.common.Assignment{:master-code-dir
> "/home/admin/install/storm-local/nimbus/stormdist/case_23-527-1370478607",
> :node->host {"212aa36c-81a0-4d4d-8104-759d9f128669" "dev163015.sqa.cm6"},
> :executor->node+port {[2 2] ["212aa36c-81a0-4d4d-8104-759d9f128669" 6706], [3
> 3] ["212aa36c-81a0-4d4d-8104-759d9f128669" 6709], [4 4]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6710], [5 5]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6711], [6 6]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6705], [7 7]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6706], [8 8]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6709], [9 9]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6710], [10 10]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6711], [11 11]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6705], [12 12]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6706], [13 13]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6709], [1 1]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6705]}, :executor->start-time-secs
> {[2 2] 1370501692, [3 3] 1370501692, [4 4] 1370501692, [5 5] 1370501692, [6
> 6] 1370501692, [7 7] 1370501692, [8 8] 1370501692, [9 9] 1370501692, [10 10]
> 1370501692, [11 11] 1370501692, [12 12] 1370501692, [13 13] 1370501692, [1 1]
> 1370501692}}
> 2013-06-06 14:54:52 nimbus [INFO] Cleaning up case_2-486-1370440841
> 2013-06-06 14:54:52 nimbus [INFO] Cleaning up case_9-536-1370488243
> 2013-06-06 14:54:52 nimbus [INFO] Cleaning up case_7-490-1370441170
> 2013-06-06 14:54:53 nimbus [INFO] Delaying event :remove for 0 secs for
> case_10-537-1370488327
> 2013-06-06 14:54:53 nimbus [INFO] Updated case_10-537-1370488327 with status
> {:type :killed, :kill-time-secs 0}
> 2013-06-06 14:54:53 nimbus [INFO] Killing topology: case_10-537-1370488327
> 2013-06-06 14:55:03 EvenScheduler [INFO] Available slots:
> (["212aa36c-81a0-4d4d-8104-759d9f128669" 6700]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6701]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6702]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6703]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6704]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6705]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6706]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6707]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6708]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6709])
> 2013-06-06 14:55:03 EvenScheduler [INFO] Available slots:
> (["212aa36c-81a0-4d4d-8104-759d9f128669" 6705]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6706]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6707]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6708]
> ["212aa36c-81a0-4d4d-8104-759d9f128669" 6709])
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)