[
https://issues.apache.org/jira/browse/STORM-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15041698#comment-15041698
]
ASF GitHub Bot commented on STORM-1370:
---------------------------------------
GitHub user jerrypeng opened a pull request:
https://github.com/apache/storm/pull/923
[STORM-1370] - Bug fixes for MultitenantScheduler
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jerrypeng/storm STORM-1370
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/923.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #923
----
commit 82f9d969446898bc6bdbdb03f2b927a55174f97c
Author: Boyang Jerry Peng <[email protected]>
Date: 2015-12-04T16:23:10Z
[STORM-1370] - Bug fixes for MultitenantScheduler
----
> Bug fixes for MultitenantScheduler
> ----------------------------------
>
> Key: STORM-1370
> URL: https://issues.apache.org/jira/browse/STORM-1370
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Boyang Jerry Peng
> Assignee: Boyang Jerry Peng
>
> Bug 1:
> Sort nodes by slots used when scheduing isolated
> Because nimbus removes "dead" slots (slots for which their workers have
> not yet sent a heartbeat) before schedule is called, we cannot rely on
> teh number of free slots on a node. This will break for clusters whose
> nodes have a heterogenious number of slots configured.
> Derive the effective number of hosts by taking the minimum of the
> config's value and the number of executors in the topology.
> If the user requests the topology be scheduled among a number of hosts,
> then retry scheduling when the effective number does not match the
> scheduled number.
> Bug 2:
> Nimbus crashes from an exception being thrown by the multitenant scheduler
> trying to assign executors from an isolated topology to a node that is full.
> Error in nimbus.log:
> java.lang.IllegalStateException: Trying to assign to a full node xxxxxxxxxxxxx
> at backtype.storm.scheduler.multitenant.Node.assign(Node.java:232)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> backtype.storm.scheduler.multitenant.NodePool$RoundRobinSlotScheduler.assignSlotTo(NodePool.java:171)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> backtype.storm.scheduler.multitenant.IsolatedPool.scheduleAsNeeded(IsolatedPool.java:164)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> backtype.storm.scheduler.multitenant.MultitenantScheduler.schedule(MultitenantScheduler.java:96)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) ~[?:?]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_40]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_40]
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> ~[clojure-1.6.0.jar:?]
> at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
> ~[clojure-1.6.0.jar:?]
> at
> backtype.storm.daemon.nimbus$compute_new_scheduler_assignments.invoke(nimbus.clj:750)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:806)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.6.0.jar:?]
> at
> backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn6020$fn_6021.invoke(nimbus.clj:1245)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn_6020.invoke(nimbus.clj:1244)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$schedule_recurring$this__4635.invoke(timer.clj:105)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$mk_timer$fn_4618$fn_4619.invoke(timer.clj:50)
> [storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$mk_timer$fn__4618.invoke(timer.clj:42)
> [storm-core-0.10.1.y.jar:0.10.1.y]
> at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)