----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65303/#review196583 -----------------------------------------------------------
Ship it! Ship It! - Stephan Erb On Jan. 31, 2018, 7:12 nachm., Bill Farner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65303/ > ----------------------------------------------------------- > > (Updated Jan. 31, 2018, 7:12 nachm.) > > > Review request for Aurora and Jordan Ly. > > > Repository: aurora > > > Description > ------- > > Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative > style rather than functional. I arrived at this result after running > benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`). > > This patch also enables stack and heap profilers in jmh (more details > [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), > providing insight into the heap impact of changes. I started this change > with a heap profiler as the primary motivation, and ended up using it to > guide this improvement. > > > Diffs > ----- > > build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 > src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java > 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java > b59999ca9a5185e240ad729fefc6638476a4aecc > > > Diff: https://reviews.apache.org/r/65303/diff/2/ > > > Testing > ------- > > Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here > is an abridged version. It shows that task fetch throughput universally > improves by ~2x (mod error margins), and heap allocation reduces by at least > the same factor. Overall GC time increases slightly as captured here, but > the stddev was anecdotally high across runs. I chose to present this output > as a caveat and a discussion point. > > If you scroll to the full output at the bottom, you will see some more > granular allocation data. Please note that the `norm` stats are normalized > for the number of operations, which i find to be the most useful measure for > validating a change. Quoting the jmh sample link above: > ```quote > It is often useful to look into non-normalized counters to see if the test is > allocation/GC-bound (figure the allocation pressure "ceiling" for your > configuration!), and normalized counters to see the more precise benchmark > behavior. > ``` > > Prior to this patch: > ```console > Benchmark (numTasks) Score > Error Units > FetchAll.run 10000 481.529 ± > 184.751 ops/s > FetchAll.run:·gc.alloc.rate.norm 10000 334970.771 ± > 33544.960 B/op > > FetchAll.run 50000 78.652 ± > 20.869 ops/s > FetchAll.run:·gc.alloc.rate.norm 50000 3991107.524 ± > 701585.657 B/op > > FetchAll.run 100000 38.371 ± > 11.710 ops/s > FetchAll.run:·gc.alloc.rate.norm 100000 13487028.139 ± > 3369614.510 B/op > > IndexedFetchAndFilter.run 10000 296.557 ± > 198.389 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 655319.005 ± > 98138.360 B/op > > IndexedFetchAndFilter.run 50000 50.300 ± > 5.818 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 6671548.381 ± > 452020.849 B/op > > IndexedFetchAndFilter.run 100000 17.637 ± > 3.739 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 28100173.458 ± > 4486308.188 B/op > ``` > > With this patch: > ```console > Benchmark (numTasks) Score > Error Units > FetchAll.run 10000 1653.572 ± > 799.123 ops/s > FetchAll.run:·gc.alloc.rate.norm 10000 155426.052 ± > 10345.657 B/op > > FetchAll.run 50000 210.454 ± > 54.340 ops/s > FetchAll.run:·gc.alloc.rate.norm 50000 1457560.505 ± > 228631.547 B/op > > FetchAll.run 100000 97.783 ± > 42.130 ops/s > FetchAll.run:·gc.alloc.rate.norm 100000 5096464.582 ± > 1792136.191 B/op > > IndexedFetchAndFilter.run 10000 500.740 ± > 210.675 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 370760.068 ± > 36813.071 B/op > > IndexedFetchAndFilter.run 50000 95.316 ± > 23.084 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 3389472.432 ± > 550602.162 B/op > > IndexedFetchAndFilter.run 100000 41.572 ± > 26.747 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 12324183.188 ± > 7537788.165 B/op > ``` > > > **Full benchmark output** > > Prior to this patch: > ```console > Benchmark (numTasks) > Score Error Units > FetchAll.run 10000 > 481.529 ± 184.751 ops/s > FetchAll.run:·gc.alloc.rate 10000 > 148.678 ± 42.890 MB/sec > FetchAll.run:·gc.alloc.rate.norm 10000 > 334970.771 ± 33544.960 B/op > FetchAll.run:·gc.churn.PS_Eden_Space 10000 > 146.991 ± 135.486 MB/sec > FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 > 332983.005 ± 347401.950 B/op > FetchAll.run:·gc.churn.PS_Survivor_Space 10000 > 0.804 ± 1.823 MB/sec > FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 > 1784.147 ± 3904.546 B/op > FetchAll.run:·gc.count 10000 > 9.000 counts > FetchAll.run:·gc.time 10000 > 143.000 ms > > FetchAll.run 50000 > 78.652 ± 20.869 ops/s > FetchAll.run:·gc.alloc.rate 50000 > 250.771 ± 34.190 MB/sec > FetchAll.run:·gc.alloc.rate.norm 50000 > 3991107.524 ± 701585.657 B/op > FetchAll.run:·gc.churn.PS_Eden_Space 50000 > 250.131 ± 144.214 MB/sec > FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 > 3999003.844 ± 2907196.744 B/op > FetchAll.run:·gc.churn.PS_Old_Gen 50000 > 6.937 ± 20.180 MB/sec > FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 > 111462.141 ± 322286.235 B/op > FetchAll.run:·gc.churn.PS_Survivor_Space 50000 > 6.056 ± 4.371 MB/sec > FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 > 96534.909 ± 73072.098 B/op > FetchAll.run:·gc.count 50000 > 22.000 counts > FetchAll.run:·gc.time 50000 > 3222.000 ms > > FetchAll.run 100000 > 38.371 ± 11.710 ops/s > FetchAll.run:·gc.alloc.rate 100000 > 343.280 ± 63.923 MB/sec > FetchAll.run:·gc.alloc.rate.norm 100000 > 13487028.139 ± 3369614.510 B/op > FetchAll.run:·gc.churn.PS_Eden_Space 100000 > 343.804 ± 147.542 MB/sec > FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 > 13524848.537 ± 7132093.384 B/op > FetchAll.run:·gc.churn.PS_Old_Gen 100000 > 7.251 ± 26.847 MB/sec > FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 > 286256.200 ± 1043939.286 B/op > FetchAll.run:·gc.churn.PS_Survivor_Space 100000 > 11.448 ± 16.645 MB/sec > FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 > 440924.671 ± 539369.420 B/op > FetchAll.run:·gc.count 100000 > 53.000 counts > FetchAll.run:·gc.time 100000 > 8664.000 ms > > IndexedFetchAndFilter.run 10000 > 296.557 ± 198.389 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate 10000 > 178.657 ± 96.891 MB/sec > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 > 655319.005 ± 98138.360 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 > 181.829 ± 115.598 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 > 669894.533 ± 362265.228 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 > 1.017 ± 2.764 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 > 3509.419 ± 8933.232 B/op > IndexedFetchAndFilter.run:·gc.count 10000 > 11.000 counts > IndexedFetchAndFilter.run:·gc.time 10000 > 174.000 ms > > IndexedFetchAndFilter.run 50000 > 50.300 ± 5.818 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate 50000 > 271.042 ± 35.522 MB/sec > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 > 6671548.381 ± 452020.849 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 > 278.006 ± 188.990 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 > 6835542.988 ± 4208216.383 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 > 7.836 ± 22.513 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 > 194944.435 ± 557587.333 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 > 6.063 ± 2.432 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 > 148960.731 ± 42282.391 B/op > IndexedFetchAndFilter.run:·gc.count 50000 > 24.000 counts > IndexedFetchAndFilter.run:·gc.time 50000 > 3059.000 ms > > IndexedFetchAndFilter.run 100000 > 17.637 ± 3.739 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate 100000 > 336.740 ± 69.527 MB/sec > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 > 28100173.458 ± 4486308.188 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 > 336.494 ± 88.830 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 > 28063164.240 ± 4888826.638 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 > 8.028 ± 37.263 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 > 672808.968 ± 2924497.150 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 > 11.351 ± 17.881 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 > 930977.737 ± 1252367.282 B/op > IndexedFetchAndFilter.run:·gc.count 100000 > 47.000 counts > IndexedFetchAndFilter.run:·gc.time 100000 > 7245.000 ms > ``` > > With this patch: > ```console > Benchmark (numTasks) > Score Error Units > FetchAll.run 10000 > 1653.572 ± 799.123 ops/s > FetchAll.run:·gc.alloc.rate 10000 > 236.532 ± 98.709 MB/sec > FetchAll.run:·gc.alloc.rate.norm 10000 > 155426.052 ± 10345.657 B/op > FetchAll.run:·gc.churn.PS_Eden_Space 10000 > 247.755 ± 55.490 MB/sec > FetchAll.run:·gc.churn.PS_Eden_Space.norm 10000 > 163873.606 ± 59092.580 B/op > FetchAll.run:·gc.churn.PS_Survivor_Space 10000 > 1.328 ± 1.540 MB/sec > FetchAll.run:·gc.churn.PS_Survivor_Space.norm 10000 > 883.684 ± 1120.393 B/op > FetchAll.run:·gc.count 10000 > 18.000 counts > FetchAll.run:·gc.time 10000 > 191.000 ms > > FetchAll.run 50000 > 210.454 ± 54.340 ops/s > FetchAll.run:·gc.alloc.rate 50000 > 248.216 ± 15.196 MB/sec > FetchAll.run:·gc.alloc.rate.norm 50000 > 1457560.505 ± 228631.547 B/op > FetchAll.run:·gc.churn.PS_Eden_Space 50000 > 239.336 ± 174.541 MB/sec > FetchAll.run:·gc.churn.PS_Eden_Space.norm 50000 > 1409078.860 ± 1141224.117 B/op > FetchAll.run:·gc.churn.PS_Old_Gen 50000 > 6.504 ± 17.220 MB/sec > FetchAll.run:·gc.churn.PS_Old_Gen.norm 50000 > 38644.950 ± 105262.889 B/op > FetchAll.run:·gc.churn.PS_Survivor_Space 50000 > 5.994 ± 4.160 MB/sec > FetchAll.run:·gc.churn.PS_Survivor_Space.norm 50000 > 35246.411 ± 25958.915 B/op > FetchAll.run:·gc.count 50000 > 21.000 counts > FetchAll.run:·gc.time 50000 > 2875.000 ms > > FetchAll.run 100000 > 97.783 ± 42.130 ops/s > FetchAll.run:·gc.alloc.rate 100000 > 336.209 ± 80.094 MB/sec > FetchAll.run:·gc.alloc.rate.norm 100000 > 5096464.582 ± 1792136.191 B/op > FetchAll.run:·gc.churn.PS_Eden_Space 100000 > 342.190 ± 144.180 MB/sec > FetchAll.run:·gc.churn.PS_Eden_Space.norm 100000 > 5167420.986 ± 1634774.992 B/op > FetchAll.run:·gc.churn.PS_Old_Gen 100000 > 11.783 ± 36.073 MB/sec > FetchAll.run:·gc.churn.PS_Old_Gen.norm 100000 > 182947.872 ± 525172.467 B/op > FetchAll.run:·gc.churn.PS_Survivor_Space 100000 > 12.299 ± 13.795 MB/sec > FetchAll.run:·gc.churn.PS_Survivor_Space.norm 100000 > 184635.309 ± 199254.266 B/op > FetchAll.run:·gc.count 100000 > 46.000 counts > FetchAll.run:·gc.time 100000 > 7778.000 ms > > IndexedFetchAndFilter.run 10000 > 500.740 ± 210.675 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate 10000 > 171.305 ± 57.968 MB/sec > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 10000 > 370760.068 ± 36813.071 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 10000 > 176.084 ± 103.579 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 10000 > 387100.753 ± 376481.454 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 10000 > 1.305 ± 1.866 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 10000 > 2812.059 ± 3518.689 B/op > IndexedFetchAndFilter.run:·gc.count 10000 > 11.000 counts > IndexedFetchAndFilter.run:·gc.time 10000 > 170.000 ms > > IndexedFetchAndFilter.run 50000 > 95.316 ± 23.084 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate 50000 > 258.291 ± 30.111 MB/sec > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 50000 > 3389472.432 ± 550602.162 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 50000 > 250.887 ± 148.296 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 50000 > 3308741.831 ± 2461004.974 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 50000 > 5.218 ± 21.710 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 50000 > 69254.269 ± 282577.478 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 50000 > 5.803 ± 2.885 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 50000 > 76523.177 ± 51120.227 B/op > IndexedFetchAndFilter.run:·gc.count 50000 > 21.000 counts > IndexedFetchAndFilter.run:·gc.time 50000 > 2775.000 ms > > IndexedFetchAndFilter.run 100000 > 41.572 ± 26.747 ops/s > IndexedFetchAndFilter.run:·gc.alloc.rate 100000 > 331.638 ± 50.813 MB/sec > IndexedFetchAndFilter.run:·gc.alloc.rate.norm 100000 > 12324183.188 ± 7537788.165 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space 100000 > 333.474 ± 116.673 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Eden_Space.norm 100000 > 12357891.009 ± 7285356.875 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen 100000 > 10.296 ± 27.573 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Old_Gen.norm 100000 > 371782.085 ± 910072.098 B/op > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space 100000 > 11.815 ± 10.161 MB/sec > IndexedFetchAndFilter.run:·gc.churn.PS_Survivor_Space.norm 100000 > 428555.780 ± 184610.507 B/op > IndexedFetchAndFilter.run:·gc.count 100000 > 49.000 counts > IndexedFetchAndFilter.run:·gc.time 100000 > 8602.000 ms > ``` > > > Thanks, > > Bill Farner > >