GitHub user revans2 opened a pull request:
https://github.com/apache/storm/pull/2321
STORM-2733: Better load aware shuffle implementation
I have run several tests that show this works much better at big imbalances
in processing latency than did the previous shuffle implementations.
I ran some simple performance tests and because `chooseTasks` didn't change
the performance was more or less identical to what was here before.
The plan on how this would work with STORM-2686 (adding distance to
shuffle) is that we would have 4 different weights (worker local, node local,
rack local, and everywhere). Each time we update the load we update all of the
weights, for the min load currently in the locality group. This is because
executors may move from one group to another as things are rescheduled, so we
need a way to keep it consistent.
We can then test a few different ways of selecting the group we want to
target. Currently we are thinking we will select the most local group that has
a maximum load < .5 falling back to everything if we cannot find one.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/revans2/incubator-storm STORM-2733
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2321.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2321
----
commit ba758c2507842f3c8ae94da40f21ca58adb4c3bc
Author: Robert (Bobby) Evans <[email protected]>
Date: 2017-08-24T16:13:08Z
STORM-2733: Better load aware shuffle implementation
----
---