[
https://issues.apache.org/jira/browse/STORM-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Boyang Jerry Peng updated STORM-1370:
-------------------------------------
Description:
Bug 1:
Sort nodes by slots used when scheduing isolated
Because nimbus removes "dead" slots (slots for which their workers have
not yet sent a heartbeat) before schedule is called, we cannot rely on
teh number of free slots on a node. This will break for clusters whose
nodes have a heterogenious number of slots configured.
Derive the effective number of hosts by taking the minimum of the
config's value and the number of executors in the topology.
If the user requests the topology be scheduled among a number of hosts,
then retry scheduling when the effective number does not match the
scheduled number.
Bug 2:
Nimbus crashes from an exception being thrown by the multitenant scheduler
trying to assign executors from an isolated topology to a node that is full.
Error in nimbus.log:
java.lang.IllegalStateException: Trying to assign to a full node xxxxxxxxxxxxx
at backtype.storm.scheduler.multitenant.Node.assign(Node.java:232)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
backtype.storm.scheduler.multitenant.NodePool$RoundRobinSlotScheduler.assignSlotTo(NodePool.java:171)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
backtype.storm.scheduler.multitenant.IsolatedPool.scheduleAsNeeded(IsolatedPool.java:164)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
backtype.storm.scheduler.multitenant.MultitenantScheduler.schedule(MultitenantScheduler.java:96)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) ~[?:?]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_40]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_40]
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
~[clojure-1.6.0.jar:?]
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
~[clojure-1.6.0.jar:?]
at
backtype.storm.daemon.nimbus$compute_new_scheduler_assignments.invoke(nimbus.clj:750)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:806)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.6.0.jar:?]
at
backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn6020$fn_6021.invoke(nimbus.clj:1245)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn_6020.invoke(nimbus.clj:1244)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at backtype.storm.timer$schedule_recurring$this__4635.invoke(timer.clj:105)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at backtype.storm.timer$mk_timer$fn_4618$fn_4619.invoke(timer.clj:50)
[storm-core-0.10.1.y.jar:0.10.1.y]
at backtype.storm.timer$mk_timer$fn__4618.invoke(timer.clj:42)
[storm-core-0.10.1.y.jar:0.10.1.y]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
was:
Nimbus crashes from an exception being thrown by the multitenant scheduler
trying to assign executors from an isolated topology to a node that is full.
Error in nimbus.log:
java.lang.IllegalStateException: Trying to assign to a full node xxxxxxxx
at backtype.storm.scheduler.multitenant.Node.assign(Node.java:232)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
backtype.storm.scheduler.multitenant.NodePool$RoundRobinSlotScheduler.assignSlotTo(NodePool.java:171)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
backtype.storm.scheduler.multitenant.IsolatedPool.scheduleAsNeeded(IsolatedPool.java:164)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
backtype.storm.scheduler.multitenant.MultitenantScheduler.schedule(MultitenantScheduler.java:96)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) ~[?:?]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[?:1.8.0_40]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_40]
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
~[clojure-1.6.0.jar:?]
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
~[clojure-1.6.0.jar:?]
at
backtype.storm.daemon.nimbus$compute_new_scheduler_assignments.invoke(nimbus.clj:750)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:806)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.6.0.jar:?]
at
backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn6020$fn_6021.invoke(nimbus.clj:1245)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at
backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn_6020.invoke(nimbus.clj:1244)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at backtype.storm.timer$schedule_recurring$this__4635.invoke(timer.clj:105)
~[storm-core-0.10.1.y.jar:0.10.1.y]
at backtype.storm.timer$mk_timer$fn_4618$fn_4619.invoke(timer.clj:50)
[storm-core-0.10.1.y.jar:0.10.1.y]
at backtype.storm.timer$mk_timer$fn__4618.invoke(timer.clj:42)
[storm-core-0.10.1.y.jar:0.10.1.y]
at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
> Bug fixes for MultitenantScheduler
> ----------------------------------
>
> Key: STORM-1370
> URL: https://issues.apache.org/jira/browse/STORM-1370
> Project: Apache Storm
> Issue Type: Bug
> Reporter: Boyang Jerry Peng
> Assignee: Boyang Jerry Peng
>
> Bug 1:
> Sort nodes by slots used when scheduing isolated
> Because nimbus removes "dead" slots (slots for which their workers have
> not yet sent a heartbeat) before schedule is called, we cannot rely on
> teh number of free slots on a node. This will break for clusters whose
> nodes have a heterogenious number of slots configured.
> Derive the effective number of hosts by taking the minimum of the
> config's value and the number of executors in the topology.
> If the user requests the topology be scheduled among a number of hosts,
> then retry scheduling when the effective number does not match the
> scheduled number.
> Bug 2:
> Nimbus crashes from an exception being thrown by the multitenant scheduler
> trying to assign executors from an isolated topology to a node that is full.
> Error in nimbus.log:
> java.lang.IllegalStateException: Trying to assign to a full node xxxxxxxxxxxxx
> at backtype.storm.scheduler.multitenant.Node.assign(Node.java:232)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> backtype.storm.scheduler.multitenant.NodePool$RoundRobinSlotScheduler.assignSlotTo(NodePool.java:171)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> backtype.storm.scheduler.multitenant.IsolatedPool.scheduleAsNeeded(IsolatedPool.java:164)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> backtype.storm.scheduler.multitenant.MultitenantScheduler.schedule(MultitenantScheduler.java:96)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) ~[?:?]
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> ~[?:1.8.0_40]
> at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_40]
> at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
> ~[clojure-1.6.0.jar:?]
> at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
> ~[clojure-1.6.0.jar:?]
> at
> backtype.storm.daemon.nimbus$compute_new_scheduler_assignments.invoke(nimbus.clj:750)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.daemon.nimbus$mk_assignments.doInvoke(nimbus.clj:806)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at clojure.lang.RestFn.invoke(RestFn.java:410) ~[clojure-1.6.0.jar:?]
> at
> backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn6020$fn_6021.invoke(nimbus.clj:1245)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at
> backtype.storm.daemon.nimbus$fn_6009$exec_fn1502auto__6010$fn_6020.invoke(nimbus.clj:1244)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$schedule_recurring$this__4635.invoke(timer.clj:105)
> ~[storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$mk_timer$fn_4618$fn_4619.invoke(timer.clj:50)
> [storm-core-0.10.1.y.jar:0.10.1.y]
> at backtype.storm.timer$mk_timer$fn__4618.invoke(timer.clj:42)
> [storm-core-0.10.1.y.jar:0.10.1.y]
> at clojure.lang.AFn.run(AFn.java:22) [clojure-1.6.0.jar:?]
> at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)