Github user Ethanlm commented on the issue:
https://github.com/apache/storm/pull/2270
I did some experiments on ThroughputVsLatency (modified to add some
Configs) and the initial results seem similar among `shuffle`, `localOrShuffle`
and `localityAwareShuffle (LocalityASG)`.
Config:
`TOPOLOGY.MESSAGE.TIMEOUT: 300`
`TOPOLOGY_MAX_SPOUT_PENDING: 5000`
`LoadAware` is enabled by default
Env: Two openstack VM's, 8GB RAM, 4 VCPUs; 1Gbps Ethernet
Note:
1 All numbers in the tables are medians;
2 The numbers fluctuated slightly every time I ran the experiments. The
results shown below are sampled from repeated experiments.
#### Experiment1
Normal network connection.
This is trying to compare the performance in normal situation.
Rate | numWorkers | numSpout | numSplit(Bolt) | totalTime(min) |
ThroughputVsLatency | acked | acked/sec | failed | 99% | 99.90% | min | max |
mean | stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
-- | -- | -- | --
100000 | 4 | 16 | 16 | 5 | LocalityASG | 2,204,700 | 73,490.00 | 0 |
69,726,109,695 | 72,410,464,255 | 51,271,172,096 | 74,423,730,175 |
59,100,119,611.86 | 5,520,976,189.22 | 176,570 | 33,110 | 19,981 | 1,729.69
100000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 2,082,000 | 67,333.33 | 0 |
66,102,231,039 | 66,538,438,655 | 50,096,766,976 | 66,806,874,111 |
57,631,698,540.71 | 3,643,782,002.52 | 172,500 | 30,060 | 20,803 | 1,175.51
100000 | 4 | 16 | 16 | 5 | Shuffle | 2,067,640 | 68,812.00 | 0 |
72,611,790,847 | 73,752,641,535 | 50,298,093,568 | 74,557,947,903 |
62,488,041,809.45 | 5,480,336,956.59 | 184,020 | 29,220 | 23,905 | 1,363.61
30000 | 4 | 16 | 16 | 5 | LocalityASG | 906,300 | 30,210.00 | 0 |
46,727,167 | 119,865,343 | 6,529,024 | 164,364,287 | 15,393,162.30 |
8,306,088.53 | 102,760 | 46,790 | 1,825 | 498.63
30000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 906,580 | 30,219.33 | 0 |
51,838,975 | 135,921,663 | 6,561,792 | 174,063,615 | 16,187,180.85 |
9,488,799.34 | 103,680 | 46,240 | 1,832 | 742.3
30000 | 4 | 16 | 16 | 5 | Shuffle | 906,360 | 30,206.00 | 0 | 45,318,143 |
80,936,959 | 6,680,576 | 132,055,039 | 17,106,693.24 | 7,055,551.93 | 105,420 |
47,040 | 1,759 | 424.65
20000 | 4 | 16 | 16 | 5 | LocalityASG | 605,040 | 20,168.00 | 0 |
31,965,183 | 49,840,127 | 5,500,928 | 96,534,527 | 13,949,066.47 | 4,487,713.39
| 94,370 | 48,930 | 1,253 | 433.45
20000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 604,640 | 20,154.67 | 0 |
32,161,791 | 87,621,631 | 5,525,504 | 124,518,399 | 14,090,342.83 |
5,387,101.60 | 92,780 | 47,170 | 1,239 | 356.32
20000 | 4 | 16 | 16 | 5 | Shuffle | 604,840 | 20,151.33 | 0 | 33,406,975 |
70,909,951 | 4,399,104 | 98,697,215 | 14,103,326.48 | 5,074,175.87 | 90,870 |
47,230 | 1,193 | 396.6
10000 | 4 | 16 | 16 | 5 | LocalityASG | 302,260 | 10,072.00 | 0 |
27,295,743 | 57,901,055 | 4,374,528 | 88,473,599 | 12,366,716.52 | 3,887,709.71
| 75,670 | 50,950 | 796 | 347.49
10000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 302,340 | 10,077.33 | 0 |
27,721,727 | 44,728,319 | 3,999,744 | 79,298,559 | 12,859,667.20 | 3,668,796.69
| 77,890 | 52,570 | 752 | 349.92
10000 | 4 | 16 | 16 | 5 | Shuffle | 302,260 | 10,074.00 | 0 | 29,655,039 |
60,489,727 | 4,378,624 | 83,165,183 | 13,519,393.44 | 4,205,346.88 | 77,580 |
52,300 | 769 | 367.62
#### Experiment 2
Add 10ms latency on both VMs: `tc qdisc add dev eth0 root netem delay
10ms`, which means 20ms latency in total.
This is trying to simulate slow network connection
Rate | numWorkers | numSpout | numSplit(Bolt) | totalTime(min) |
ThroughputVsLatency | acked | acked/sec | failed | 99% | 99.90% | min | max |
mean | stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
-- | -- | -- | --
100000 | 4 | 16 | 16 | 5 | LocalityASG | 2,257,220 | 75,240.67 | 0 |
51,942,260,735 | 53,317,992,447 | 37,446,746,112 | 53,821,308,927 |
45,735,430,020.04 | 5,110,395,891.55 | 181,100 | 28,930 | 22,157 | 812.64
100000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 2,428,780 | 80,959.33 | 0 |
61,773,709,311 | 62,746,787,839 | 3,118,465,024 | 63,082,332,159 |
40,427,975,429.87 | 15,339,145,604.59 | 176,330 | 33,200 | 16,475 | 743.08
100000 | 4 | 16 | 16 | 5 | Shuffle | 2,138,040 | 71,268.00 | 0 |
59,726,888,959 | 60,733,521,919 | 40,500,199,424 | 61,438,164,991 |
50,071,346,036.42 | 5,670,751,071.76 | 170,910 | 34,900 | 14,231 | 638.34
30000 | 4 | 16 | 16 | 5 | LocalityASG | 908,140 | 30,271.33 | 0 |
330,301,439 | 383,254,527 | 7,000,064 | 416,808,959 | 48,647,996.71 |
55,147,277.57 | 105,160 | 44,800 | 1,975 | 579.67
30000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 902,680 | 30,089.33 | 0 |
341,049,343 | 383,254,527 | 6,946,816 | 414,973,951 | 51,019,429.45 |
58,367,846.16 | 104,980 | 47,240 | 1,895 | 642.05
30000 | 4 | 16 | 16 | 5 | Shuffle | 908,860 | 30,295.33 | 0 | 375,914,495 |
492,044,287 | 7,737,344 | 655,884,287 | 60,776,243.04 | 68,004,908.46 | 107,910
| 45,240 | 1,938 | 422.43
20000 | 4 | 16 | 16 | 5 | LocalityASG | 605,920 | 20,188.00 | 0 |
323,485,695 | 386,138,111 | 5,238,784 | 427,294,719 | 45,021,622.26 |
52,931,457.82 | 91,630 | 47,140 | 1,307 | 442.26
20000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 605,180 | 20,172.67 | 0 |
339,476,479 | 391,905,279 | 4,431,872 | 425,721,855 | 46,271,845.50 |
55,536,624.12 | 93,060 | 47,820 | 1,333 | 467.91
20000 | 4 | 16 | 16 | 5 | Shuffle | 604,280 | 20,047.33 | 0 | 356,777,983 |
432,799,743 | 4,493,312 | 576,192,511 | 52,029,378.68 | 57,552,110.96 | 93,570
| 46,560 | 1,285 | 446.67
10000 | 4 | 16 | 16 | 5 | LocalityASG | 301,200 | 10,029.33 | 0 |
330,563,583 | 378,798,079 | 4,399,104 | 413,138,943 | 42,995,014.97 |
50,182,495.91 | 77,450 | 47,930 | 776 | 301.93
10000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 303,240 | 10,108.00 | 0 |
337,379,327 | 383,516,671 | 4,100,096 | 422,838,271 | 43,840,226.34 |
52,874,295.96 | 80,080 | 47,990 | 764 | 274.21
10000 | 4 | 16 | 16 | 5 | Shuffle | 300,440 | 10,012.00 | 0 | 333,971,455 |
396,623,871 | 4,411,392 | 629,669,887 | 49,247,857.01 | 53,009,320.86 | 77,350
| 46,420 | 726 | 286.33
#### Experiment 3
Add 10ms latency on both VMs.
This is trying to show the difference between `localOrShuffle` and
`localityAwareShuffle`. Since we 4 `workers`, 4 `spout` and 3 `split (bolt)`
here, one of the spouts needs to choose a bolt task on other worker/host.
Current `localityAwareShuffle` implementation will choose the worker in the
same host. `localOrShuffle` will shuffle among these three `split (bolt)`
Rate | numWorkers | numSpout | numSplit(Bolt) | totalTime(min) |
ThroughputVsLatency | acked | acked/sec | failed | 99% | 99.90% | min | max |
mean | stddev | user | sys | gc | mem
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- |
-- | -- | -- | --
100000 | 4 | 4 | 3 | 5 | LocalityASG | 2,178,680 | 72,622.67 | 0 |
65,699,577,855 | 66,538,438,655 | 8,413,184 | 66,639,101,951 |
28,354,892,840.64 | 24,151,617,972.76 | 140,830 | 36,650 | 5,041 | 288.98
100000 | 4 | 4 | 3 | 5 | LocalOrShuffle | 2,260,940 | 75,364.67 | 0 |
58,082,721,791 | 58,485,374,975 | 36,842,766,336 | 58,619,592,703 |
45,712,216,573.06 | 5,811,071,103.74 | 145,400 | 35,490 | 6,218 | 304
100000 | 4 | 4 | 3 | 5 | Shuffle | 1,998,040 | 66,601.33 | 0 |
61,706,600,447 | 62,310,580,223 | 31,138,512,896 | 62,646,124,543 |
43,860,244,218.07 | 9,311,454,441.44 | 137,820 | 37,530 | 4,677 | 321.52
30000 | 4 | 4 | 3 | 5 | LocalityASG | 907,700 | 30,256.67 | 0 | 92,667,903
| 124,125,183 | 6,455,296 | 148,504,575 | 33,165,419.48 | 10,665,312.03 |
93,720 | 41,890 | 1,369 | 313.16
30000 | 4 | 4 | 3 | 5 | LocalOrShuffle | 901,300 | 30,043.33 | 0 |
87,162,879 | 123,535,359 | 7,098,368 | 192,675,839 | 34,214,948.18 |
11,589,701.13 | 90,650 | 42,140 | 1,334 | 227.66
30000 | 4 | 4 | 3 | 5 | Shuffle | 907,720 | 30,006.00 | 0 | 93,126,655 |
135,921,663 | 7,286,784 | 237,764,607 | 37,632,076.72 | 13,370,589.38 | 92,140
| 45,050 | 1,458 | 248.57
20000 | 4 | 4 | 3 | 5 | LocalityASG | 600,640 | 20,021.33 | 0 | 79,495,167
| 118,292,479 | 4,685,824 | 198,967,295 | 31,632,119.39 | 9,579,664.67 | 84,470
| 45,580 | 1,038 | 272.13
20000 | 4 | 4 | 3 | 5 | LocalOrShuffle | 605,200 | 20,173.33 | 0 |
78,118,911 | 122,552,319 | 7,319,552 | 205,520,895 | 33,266,995.51 |
10,770,593.95 | 84,670 | 45,980 | 1,108 | 277.52
20000 | 4 | 4 | 3 | 5 | Shuffle | 605,160 | 20,172.00 | 0 | 84,934,655 |
129,040,383 | 7,413,760 | 194,510,847 | 38,185,410.13 | 12,666,069.40 | 84,020
| 43,870 | 1,039 | 248.27
10000 | 4 | 4 | 3 | 5 | LocalityASG | 300,420 | 10,014.00 | 0 | 70,713,343
| 118,554,623 | 4,151,296 | 156,368,895 | 30,923,105.91 | 8,611,777.75 | 72,840
| 48,890 | 733 | 246.86
10000 | 4 | 4 | 3 | 5 | LocalOrShuffle | 302,600 | 10,086.00 | 0 |
71,630,847 | 118,882,303 | 3,786,752 | 154,664,959 | 32,406,076.56 |
10,199,838.54 | 71,170 | 46,670 | 661 | 256.4
10000 | 4 | 4 | 3 | 5 | Shuffle | 300,340 | 10,010.00 | 0 | 69,861,375 |
126,615,551 | 5,586,944 | 155,058,175 | 36,010,152.88 | 12,085,641.80 | 75,850
| 49,210 | 715 | 230.3
The results above show performance similarity between `localOrShuffle` and
`LocalityAwareShuffleGrouping` should be a good thing since in some certain
situations they are only different on `chooseTasks()` function. And
`LocalityAwareShuffleGrouping` is more general than `localOrShuffle`.
I am still doing more performance testing. Posting these results here to
hopefully get some feedback to help my further testing.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---