Github user Ethanlm commented on the issue: https://github.com/apache/storm/pull/2270 I did some experiments on ThroughputVsLatency (modified to add some Configs) and the initial results seem similar among `shuffle`, `localOrShuffle` and `localityAwareShuffle (LocalityASG)`. Config: `TOPOLOGY.MESSAGE.TIMEOUT: 300` `TOPOLOGY_MAX_SPOUT_PENDING: 5000` `LoadAware` is enabled by default Env: Two openstack VM's, 8GB RAM, 4 VCPUs; 1Gbps Ethernet Note: 1 All numbers in the tables are medians; 2 The numbers fluctuated slightly every time I ran the experiments. The results shown below are sampled from repeated experiments. #### Experiment1 Normal network connection. This is trying to compare the performance in normal situation. Rate | numWorkers | numSpout | numSplit(Bolt) | totalTime(min) | ThroughputVsLatency | acked | acked/sec | failed | 99% | 99.90% | min | max | mean | stddev | user | sys | gc | mem -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 100000 | 4 | 16 | 16 | 5 | LocalityASG | 2,204,700 | 73,490.00 | 0 | 69,726,109,695 | 72,410,464,255 | 51,271,172,096 | 74,423,730,175 | 59,100,119,611.86 | 5,520,976,189.22 | 176,570 | 33,110 | 19,981 | 1,729.69 100000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 2,082,000 | 67,333.33 | 0 | 66,102,231,039 | 66,538,438,655 | 50,096,766,976 | 66,806,874,111 | 57,631,698,540.71 | 3,643,782,002.52 | 172,500 | 30,060 | 20,803 | 1,175.51 100000 | 4 | 16 | 16 | 5 | Shuffle | 2,067,640 | 68,812.00 | 0 | 72,611,790,847 | 73,752,641,535 | 50,298,093,568 | 74,557,947,903 | 62,488,041,809.45 | 5,480,336,956.59 | 184,020 | 29,220 | 23,905 | 1,363.61 30000 | 4 | 16 | 16 | 5 | LocalityASG | 906,300 | 30,210.00 | 0 | 46,727,167 | 119,865,343 | 6,529,024 | 164,364,287 | 15,393,162.30 | 8,306,088.53 | 102,760 | 46,790 | 1,825 | 498.63 30000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 906,580 | 30,219.33 | 0 | 51,838,975 | 135,921,663 | 6,561,792 | 174,063,615 | 16,187,180.85 | 9,488,799.34 | 103,680 | 46,240 | 1,832 | 742.3 30000 | 4 | 16 | 16 | 5 | Shuffle | 906,360 | 30,206.00 | 0 | 45,318,143 | 80,936,959 | 6,680,576 | 132,055,039 | 17,106,693.24 | 7,055,551.93 | 105,420 | 47,040 | 1,759 | 424.65 20000 | 4 | 16 | 16 | 5 | LocalityASG | 605,040 | 20,168.00 | 0 | 31,965,183 | 49,840,127 | 5,500,928 | 96,534,527 | 13,949,066.47 | 4,487,713.39 | 94,370 | 48,930 | 1,253 | 433.45 20000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 604,640 | 20,154.67 | 0 | 32,161,791 | 87,621,631 | 5,525,504 | 124,518,399 | 14,090,342.83 | 5,387,101.60 | 92,780 | 47,170 | 1,239 | 356.32 20000 | 4 | 16 | 16 | 5 | Shuffle | 604,840 | 20,151.33 | 0 | 33,406,975 | 70,909,951 | 4,399,104 | 98,697,215 | 14,103,326.48 | 5,074,175.87 | 90,870 | 47,230 | 1,193 | 396.6 10000 | 4 | 16 | 16 | 5 | LocalityASG | 302,260 | 10,072.00 | 0 | 27,295,743 | 57,901,055 | 4,374,528 | 88,473,599 | 12,366,716.52 | 3,887,709.71 | 75,670 | 50,950 | 796 | 347.49 10000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 302,340 | 10,077.33 | 0 | 27,721,727 | 44,728,319 | 3,999,744 | 79,298,559 | 12,859,667.20 | 3,668,796.69 | 77,890 | 52,570 | 752 | 349.92 10000 | 4 | 16 | 16 | 5 | Shuffle | 302,260 | 10,074.00 | 0 | 29,655,039 | 60,489,727 | 4,378,624 | 83,165,183 | 13,519,393.44 | 4,205,346.88 | 77,580 | 52,300 | 769 | 367.62 #### Experiment 2 Add 10ms latency on both VMs: `tc qdisc add dev eth0 root netem delay 10ms`, which means 20ms latency in total. This is trying to simulate slow network connection Rate | numWorkers | numSpout | numSplit(Bolt) | totalTime(min) | ThroughputVsLatency | acked | acked/sec | failed | 99% | 99.90% | min | max | mean | stddev | user | sys | gc | mem -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 100000 | 4 | 16 | 16 | 5 | LocalityASG | 2,257,220 | 75,240.67 | 0 | 51,942,260,735 | 53,317,992,447 | 37,446,746,112 | 53,821,308,927 | 45,735,430,020.04 | 5,110,395,891.55 | 181,100 | 28,930 | 22,157 | 812.64 100000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 2,428,780 | 80,959.33 | 0 | 61,773,709,311 | 62,746,787,839 | 3,118,465,024 | 63,082,332,159 | 40,427,975,429.87 | 15,339,145,604.59 | 176,330 | 33,200 | 16,475 | 743.08 100000 | 4 | 16 | 16 | 5 | Shuffle | 2,138,040 | 71,268.00 | 0 | 59,726,888,959 | 60,733,521,919 | 40,500,199,424 | 61,438,164,991 | 50,071,346,036.42 | 5,670,751,071.76 | 170,910 | 34,900 | 14,231 | 638.34 30000 | 4 | 16 | 16 | 5 | LocalityASG | 908,140 | 30,271.33 | 0 | 330,301,439 | 383,254,527 | 7,000,064 | 416,808,959 | 48,647,996.71 | 55,147,277.57 | 105,160 | 44,800 | 1,975 | 579.67 30000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 902,680 | 30,089.33 | 0 | 341,049,343 | 383,254,527 | 6,946,816 | 414,973,951 | 51,019,429.45 | 58,367,846.16 | 104,980 | 47,240 | 1,895 | 642.05 30000 | 4 | 16 | 16 | 5 | Shuffle | 908,860 | 30,295.33 | 0 | 375,914,495 | 492,044,287 | 7,737,344 | 655,884,287 | 60,776,243.04 | 68,004,908.46 | 107,910 | 45,240 | 1,938 | 422.43 20000 | 4 | 16 | 16 | 5 | LocalityASG | 605,920 | 20,188.00 | 0 | 323,485,695 | 386,138,111 | 5,238,784 | 427,294,719 | 45,021,622.26 | 52,931,457.82 | 91,630 | 47,140 | 1,307 | 442.26 20000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 605,180 | 20,172.67 | 0 | 339,476,479 | 391,905,279 | 4,431,872 | 425,721,855 | 46,271,845.50 | 55,536,624.12 | 93,060 | 47,820 | 1,333 | 467.91 20000 | 4 | 16 | 16 | 5 | Shuffle | 604,280 | 20,047.33 | 0 | 356,777,983 | 432,799,743 | 4,493,312 | 576,192,511 | 52,029,378.68 | 57,552,110.96 | 93,570 | 46,560 | 1,285 | 446.67 10000 | 4 | 16 | 16 | 5 | LocalityASG | 301,200 | 10,029.33 | 0 | 330,563,583 | 378,798,079 | 4,399,104 | 413,138,943 | 42,995,014.97 | 50,182,495.91 | 77,450 | 47,930 | 776 | 301.93 10000 | 4 | 16 | 16 | 5 | LocalOrShuffle | 303,240 | 10,108.00 | 0 | 337,379,327 | 383,516,671 | 4,100,096 | 422,838,271 | 43,840,226.34 | 52,874,295.96 | 80,080 | 47,990 | 764 | 274.21 10000 | 4 | 16 | 16 | 5 | Shuffle | 300,440 | 10,012.00 | 0 | 333,971,455 | 396,623,871 | 4,411,392 | 629,669,887 | 49,247,857.01 | 53,009,320.86 | 77,350 | 46,420 | 726 | 286.33 #### Experiment 3 Add 10ms latency on both VMs. This is trying to show the difference between `localOrShuffle` and `localityAwareShuffle`. Since we 4 `workers`, 4 `spout` and 3 `split (bolt)` here, one of the spouts needs to choose a bolt task on other worker/host. Current `localityAwareShuffle` implementation will choose the worker in the same host. `localOrShuffle` will shuffle among these three `split (bolt)` Rate | numWorkers | numSpout | numSplit(Bolt) | totalTime(min) | ThroughputVsLatency | acked | acked/sec | failed | 99% | 99.90% | min | max | mean | stddev | user | sys | gc | mem -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 100000 | 4 | 4 | 3 | 5 | LocalityASG | 2,178,680 | 72,622.67 | 0 | 65,699,577,855 | 66,538,438,655 | 8,413,184 | 66,639,101,951 | 28,354,892,840.64 | 24,151,617,972.76 | 140,830 | 36,650 | 5,041 | 288.98 100000 | 4 | 4 | 3 | 5 | LocalOrShuffle | 2,260,940 | 75,364.67 | 0 | 58,082,721,791 | 58,485,374,975 | 36,842,766,336 | 58,619,592,703 | 45,712,216,573.06 | 5,811,071,103.74 | 145,400 | 35,490 | 6,218 | 304 100000 | 4 | 4 | 3 | 5 | Shuffle | 1,998,040 | 66,601.33 | 0 | 61,706,600,447 | 62,310,580,223 | 31,138,512,896 | 62,646,124,543 | 43,860,244,218.07 | 9,311,454,441.44 | 137,820 | 37,530 | 4,677 | 321.52 30000 | 4 | 4 | 3 | 5 | LocalityASG | 907,700 | 30,256.67 | 0 | 92,667,903 | 124,125,183 | 6,455,296 | 148,504,575 | 33,165,419.48 | 10,665,312.03 | 93,720 | 41,890 | 1,369 | 313.16 30000 | 4 | 4 | 3 | 5 | LocalOrShuffle | 901,300 | 30,043.33 | 0 | 87,162,879 | 123,535,359 | 7,098,368 | 192,675,839 | 34,214,948.18 | 11,589,701.13 | 90,650 | 42,140 | 1,334 | 227.66 30000 | 4 | 4 | 3 | 5 | Shuffle | 907,720 | 30,006.00 | 0 | 93,126,655 | 135,921,663 | 7,286,784 | 237,764,607 | 37,632,076.72 | 13,370,589.38 | 92,140 | 45,050 | 1,458 | 248.57 20000 | 4 | 4 | 3 | 5 | LocalityASG | 600,640 | 20,021.33 | 0 | 79,495,167 | 118,292,479 | 4,685,824 | 198,967,295 | 31,632,119.39 | 9,579,664.67 | 84,470 | 45,580 | 1,038 | 272.13 20000 | 4 | 4 | 3 | 5 | LocalOrShuffle | 605,200 | 20,173.33 | 0 | 78,118,911 | 122,552,319 | 7,319,552 | 205,520,895 | 33,266,995.51 | 10,770,593.95 | 84,670 | 45,980 | 1,108 | 277.52 20000 | 4 | 4 | 3 | 5 | Shuffle | 605,160 | 20,172.00 | 0 | 84,934,655 | 129,040,383 | 7,413,760 | 194,510,847 | 38,185,410.13 | 12,666,069.40 | 84,020 | 43,870 | 1,039 | 248.27 10000 | 4 | 4 | 3 | 5 | LocalityASG | 300,420 | 10,014.00 | 0 | 70,713,343 | 118,554,623 | 4,151,296 | 156,368,895 | 30,923,105.91 | 8,611,777.75 | 72,840 | 48,890 | 733 | 246.86 10000 | 4 | 4 | 3 | 5 | LocalOrShuffle | 302,600 | 10,086.00 | 0 | 71,630,847 | 118,882,303 | 3,786,752 | 154,664,959 | 32,406,076.56 | 10,199,838.54 | 71,170 | 46,670 | 661 | 256.4 10000 | 4 | 4 | 3 | 5 | Shuffle | 300,340 | 10,010.00 | 0 | 69,861,375 | 126,615,551 | 5,586,944 | 155,058,175 | 36,010,152.88 | 12,085,641.80 | 75,850 | 49,210 | 715 | 230.3 The results above show performance similarity between `localOrShuffle` and `LocalityAwareShuffleGrouping` should be a good thing since in some certain situations they are only different on `chooseTasks()` function. And `LocalityAwareShuffleGrouping` is more general than `localOrShuffle`. I am still doing more performance testing. Posting these results here to hopefully get some feedback to help my further testing.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---