[GitHub] storm issue #2241: STORM-2306 : Messaging subsystem redesign.

revans2 Fri, 28 Jul 2017 13:52:40 -0700

Github user revans2 commented on the issue:

    https://github.com/apache/storm/pull/2241
  
    My benchmark results with Throughput Vs Latency.
    
    A side note on testing methodology.  I strongly disagree with some of the 
statements made about testing methodology.  I am happy to have a discussion 
with anyone about why Throughput Vs Latency is the way it is.  If we want to 
file a separate JIRA, resolve these differences of opinion and come up with a 
preferred methodology I am happy to.  Just so that others understand my 
methodology I want to summarize it here.  An end user usually knows a few 
things about what they want to build.
    
    1. An estimated throughput range that the topology will need to handle.
    2. A budget
    3. An target latency after which the data is not good any more (or perhaps 
more accurately a function that describes the drop in value for the data as it 
ages aka an SLA/SLO).
    
    As a user I want to be able to measure my topology and see what it is going 
to cost me to achieve both 1 and 3, and if not what do I need to adjust to make 
this work i.e. raise the budget because the data is important or live with 
longer latency because the value does not drop off too fast.  The Throughput Vs 
Latency test is not intended to measure the maximum throughput that a topology 
configuration can handle.  It is intended to be a building block where you run 
it multiple times varying the throughput and measuring the cost (CPU/Memory) 
and latency at each throughput level.  I'll leave it at that for now.
    
    For this test I ran these on the same laptop I listed 
[above](https://github.com/apache/storm/pull/2241#issuecomment-318102321).  I 
ran using 2 flavors of topology.  I modified Throughput Vs Latency to let me 
set the exact parallelism of all of the components.
    
     * Flavor A has 1 acker 1 spout 1 split bolt 1 count bolt and a max spout 
pending set to 1000.  This was optimized for maximum throughput under STORM-2306
     * Flavor B has 2 ackers 2 spouts 2 split bolts 3 counts bolts and a max 
spout pending set to 1500.  This was optimized for maximum throughput under 
master (450ed63)
    
    I ran all of these at different throughput values and against both versions 
of storm to give a better apples to apples comparison.  If a topology could not 
keep up with the desired throughput I threw out the results, as the latency and 
CPU used reported are invalid for that throughput.  For these tests I am only 
using CPU as a measure of cost, because I didn't have a simple way to compare 
memory.  I can if we really want to, but I didn't want to have to parse the gc 
log files, and under STORM-2306 the system bolt's metrics are not being 
reported any more which would have collected them automatically for me. For CPU 
I logged the output from top and pulled out the CPU usage when the topology had 
been running for 1 min.
    
    RESULTS:
    
    ![chart 
1](https://user-images.githubusercontent.com/3441321/28734894-97e793f6-73a8-11e7-9703-39daed4208d3.png)
    
    As was already discussed the latency at low throughput for STORM-2306 needs 
to be addressed and can be seen here.  But if we zoom into 1 second maximum 
latency.
    
    ![chart 
2](https://user-images.githubusercontent.com/3441321/28734943-c4c97254-73a8-11e7-8c08-0f3db98a7bb2.png)
    
    it is clearer to see what is happening at the low end.  I also graphed the 
throughput vs the cost (just CPU)
    
    ![chart 
3](https://user-images.githubusercontent.com/3441321/28735082-56aec0b6-73a9-11e7-8a97-dad37cfb3ff6.png)
    
    It is interesting, but I think it is more informative to see it as the 
average cost to process 100 tuples per second.
    
    ![chart 
4](https://user-images.githubusercontent.com/3441321/28735110-724a53e4-73a9-11e7-886d-2f9dde991593.png)
    
    Again because of the low throughput issues it is helpful to zoom in on the 
low end.
    
    ![chart 
5](https://user-images.githubusercontent.com/3441321/28735127-86a315e2-73a9-11e7-9213-5650d65c9b15.png)
    
    Here is the data that I used.
    
    
    Â  | Latency ms | cores CPU | cost/100 tuple/sec | Latency ms | cores CPU | 
cost/100 tuple/sec | Latency ms | cores CPU | cost/100 tuple/sec | Latency ms | 
cores CPU | cost/100 tuple/sec
    -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
    THROUGHPUT | master flavor A | master flavor A | master flavor A | master 
flavor B | master flavor B | master flavor B | STORM-2306 flavor A | STORM-2306 
flavor A | STORM-2306 flavor A | STORM-2306 flavor B | STORM-2306 flavor B | 
STORM-2306 flavor B
    500 | 14.26 | 65.90 | 13.18 | 13.20 | 69.24 | 13.85 | 3,978.30 | 137.00 | 
27.40 | 12,029.30 | 350.60 | 70.12
    1,000 | 14.38 | 60.20 | 6.02 | 12.76 | 77.69 | 7.77 | 2,206.20 | 130.20 | 
13.02 | 8,975.81 | 367.50 | 36.75
    2,000 | 14.52 | 70.31 | 3.52 | 12.58 | 82.89 | 4.14 | 1,163.92 | 134.20 | 
6.71 | 5,075.11 | 352.00 | 17.60
    4,000 | 14.91 | 68.09 | 1.70 | 13.39 | 71.44 | 1.79 | 598.74 | 134.80 | 
3.37 | 2,969.57 | 345.30 | 8.63
    8,000 | 14.84 | 84.58 | 1.06 | 14.07 | 100.56 | 1.26 | 309.59 | 146.00 | 
1.83 | 1,491.08 | 377.90 | 4.72
    16,000 | 13.56 | 78.38 | 0.49 | 14.90 | 101.59 | 0.63 | 168.43 | 146.20 | 
0.91 | 753.40 | 392.40 | 2.45
    32,000 | 13.20 | 131.52 | 0.41 | 14.55 | 165.90 | 0.52 | 86.18 | 159.90 | 
0.50 | 380.11 | 421.00 | 1.32
    50,000 | 12.20 | 165.29 | 0.33 | 14.25 | 200.03 | 0.40 | 69.21 | 180.00 | 
0.36 | 254.41 | 397.10 | 0.79
    64,000 | 11.81 | 150.86 | 0.24 | 13.85 | 260.52 | 0.41 | 66.98 | 225.00 | 
0.35 | 207.62 | 449.50 | 0.70
    75,000 | 11.50 | 200.56 | 0.27 | 13.29 | 292.54 | 0.39 | 65.80 | 280.80 | 
0.37 | 212.21 | 465.00 | 0.62
    100,000 | 58.92 | 208.41 | 0.21 | 12.66 | 299.70 | 0.30 | 74.91 | 288.50 | 
0.29 | 156.76 | 503.00 | 0.50
    128,000 | 97.91 | 356.76 | 0.28 | 83.36 | 384.78 | 0.30 | 100.79 | 296.70 | 
0.23 | 189.01 | 521.40 | 0.41
    150,000 | Â  | Â  | Â  | 87.69 | 507.71 | 0.34 | 101.78 | 398.10 | 0.27 | 
140.25 | 529.20 | 0.35
    200,000 | Â  | Â  | Â  | Â  | Â  | Â  | 332.92 | 336.80 | 0.17 | Â  | Â  | 
Â 
    
    I did run the tests at 256,000 sentences per second and at 300,000 but none 
of the topologies could keep up.  Master flavor A maxed out at 143k/sec and was 
CPU bound (the acker could not keep up) Master and STORM-2306 flavor B both 
maxed out at just under 200k.  It could be argued that they could keep up with 
200k so long as there never was a hickup, but I felt it better to exclude them. 
 STORM-2306 flavor A maxed out at about 223k.  Or about a 12% increase in the 
maximum throughput possible on master.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] storm issue #2241: STORM-2306 : Messaging subsystem redesign.

Reply via email to