GitHub user roshannaik opened a pull request:
https://github.com/apache/storm/pull/2241
STORM-2306 : Messaging subsystem redesign.
Having spent a lot of time on this, I am happy to share some good news and
some even better news with you.
Before venturing further, I must add, to limit the scope of this PR, no
attempt was made to improve ACK-ing mode perf. Although there are some big
latency gains seen in ACK mode, these are a side effect of the new messaging
design and work remains to be done to improve ACK mode.
Please see the design docs posted on the STORM-2306 jira for details on
what is being done
So, first the good news .. a quick competitive evaluation:
# 1) Competitive Perf evaluation :
Here are some quick comparison of Storm numbers taken on my laptop against
numbers for similar/identical topologies published by Heron, Flink and Apex.
Shall provide just rolled up summary here and leave the detailed analysis for
later.
Storm numbers here were run on my MacBook Pro (2015) with 16GB ram and a
single 4 core Intel i7 chip.
### A) Compared To Heron and Flink:
------------------------------
Heron recently published this blog about the big perf improvements (~4-6x)
they achieved.
https://blog.twitter.com/engineering/en_us/topics/open-source/2017/optimizing-twitter-heron.html
They ran it on dual 12-core Intel Xeon chips (didn't say how many machines).
They use a simplified word count topology that I have emulated for
comparison purposes and included it as part of this PR
(SimplifiedWordCountTopo).
Flink also publishes numbers for a similar setup here
https://flink.apache.org/features.html#streaming
Below are per core throughput numbers.
**[:HERON:]**
Acking Disabled: per core = **~475 k/sec**.
Acking Enabled: per core = **~150 k/sec**. Latency = **30ms**
**[:FLINK:]**
Per core: **~1 mill/sec**
**[:STORM:]**
Acking Disabled: per core = **2 mill/sec.** (1 spout & 1 counter bolt)
Acking Enabled: per core = **0.6 mill/sec**, Latency = **0.73 ms** (+1
acker)
**Takeaways:**
- Storm's with-ACK throughput is better than Heron's no-ACK throughput.
- Without ACKs, Storm is far ahead of Heron and also better than Flink.
- Storm's Latency (in microseconds) is also significantly better than both
(although this metric is better to compared with multiple machines in the run).
AFAIKT, Flink is generally not known for having good latency (as per Flink's
own comparison with Storm -
https://data-artisans.com/blog/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink).
### B) Compared to Apex:
-----------------------------------
Apex appears to be the best performer among the opensource lot.. by a
reasonably good margin. Some numbers they published in their early days (when
it was called DataTorrent) were misleading/dubious IMO, but the newer numbers
appear credible.
Here we look at how fast inter spout/bolt communication can be achieved
using an ultra minimalist topology.
A ConstSpout emits a short string to a DevNull bolt that discards the
tuples it receives. This topo has been in storm-perf for sometime now.
Apex provides numbers for a identical setup ... what they call "container
local" performance here:
https://www.datatorrent.com/blog/blog-apex-performance-benchmark/
Other than the fact that Storm numbers were on my laptop, these numbers are
a good apples to apples comparison.
**[:APEX:]**
Container local Throughput : **~4.2 mill/sec**
**[:STORM:]**
Worker local throughput : **8.1 mill/sec**
# 2) Core messaging Performance
Now for the better news. The redesigned messaging system is actually much
faster and able to move messages between threads at an astounding rate .... :
- **120 mill/sec** (batchSz=1000, 2 producers writing to 1 consumer).
- **67 mill/sec** (batchSz=1000, 1 producers writing to 1 consumer).
I have included JCQueuePerfTest.java in this PR to help get quick
measurements from within the IDE.
That naturally begs the question .. why is Storm pushing only 8.1 mill/sec
between a ConstSpout and DevNullBolt ? The short answer is ... there are big
bottlenecks in other parts of the code. In this PR I have tackled some such
bottlenecks but many still remain. We are faster than the competition, but
still have room to be much much faster. If anyone is interested in pursuing
these to push Storm's perf to the next level, I am happy to point them in the
right direction.
Again, please refer to the design docs in the JIRA for details on the new
design and the rationale behind them.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/roshannaik/storm STORM-2306m
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/2241.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #2241
----
commit 5c0db923ecd8e4e1ce0e325ee2fd0f25bae7b0c2
Author: Roshan Naik <[email protected]>
Date: 2017-07-25T03:01:00Z
Messaging susbsytem redesign. Rebased to latest master. Validated
compilation and few simple topo runs. C->N: 8 mil/sec. 2.8 mil/sec with 2
workers. 1 mil/sec with ack (30-800 micosec). C->ID->N: 6.2mill/sec.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---