GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/storm/pull/2261
STORM-2678 Improve performance of LoadAwareShuffleGrouping
* construct ring which represents distribution of tasks based on load
* chooseTasks() just accesses the ring sequentially
* port related tests from Clojure to Java
I'm not expert of micro-benchmark but I also craft some of simple
performance tests which you can see them from LoadAwareShuffleGroupingTest.
They are `testBenchmarkLoadAwareShuffleGroupingEvenLoad` and
`testBenchmarkLoadAwareShuffleGroupingUnevenLoad `, and I put `@Ignore` to
avoid running unless we want to do performance test on.
Here's my observation on running them, using old and new
LoadAwareShuffleGrouping:
> testBenchmarkLoadAwareShuffleGroupingEvenLoad (old)
Duration: 114470 ms
Duration: 115973 ms
Duration: 114807 ms
> testBenchmarkLoadAwareShuffleGroupingEvenLoad (new)
Duration: 106819 ms
Duration: 105857 ms
Duration: 106789 ms
> testBenchmarkLoadAwareShuffleGroupingUnevenLoad (old)
Duration: 113484 ms
Duration: 118152 ms
Duration: 112664 ms
> testBenchmarkLoadAwareShuffleGroupingUnevenLoad (new)
Duration: 106071 ms
Duration: 105938 ms
Duration: 106115 ms
You can see that modified LoadAwareShuffleGrouping is faster than before,
5% or more for single threaded access. Maybe would want to do multi-threading
performance test, with keeping in mind that accessing OutputCollector with
single-thread is preferred over multi-threads.
This still respects thread-safety, and I think respecting thread-safety is
better than before, given that we only allow one thread to update the ring, and
we replace the new information at once, not updating information on the fly
which other threads are referencing.
We still don't guard information with mutual-exclusion manner, but I think
it is tolerable like we do before.
I'm planning to explore some more, mostly about reducing call
System.currentTimeMillis() in chooseTasks(). I'll put additional commits if I
find any more improvements: it will be easy to revert some if we don't want to.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/HeartSaVioR/storm STORM-2678
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2261.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2261
----
commit 6c693e3a2d57cca3648240335ef6e1862dbbfc4c
Author: Jungtaek Lim <[email protected]>
Date: 2017-08-04T04:25:27Z
STORM-2678 Improve performance of LoadAwareShuffleGrouping
* construct ring which represents distribution of tasks based on load
* chooseTasks() just accesses the ring sequentially
* port related tests from Clojure to Java
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---