----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/65303/#review196107 -----------------------------------------------------------
src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java Line 234 (original), 235 (patched) <https://reviews.apache.org/r/65303/#comment275620> Have you considered passing in the predicate filter in here? For index scans this should help to eliminate a large amount of allocations. - Stephan Erb On Jan. 24, 2018, 1:32 a.m., Bill Farner wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/65303/ > ----------------------------------------------------------- > > (Updated Jan. 24, 2018, 1:32 a.m.) > > > Review request for Aurora and Jordan Ly. > > > Repository: aurora > > > Description > ------- > > Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative > style rather than functional. I arrived at this result after running > benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`). > > This patch also enables stack and heap profilers in jmh (more details > [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)), > providing insight into the heap impact of changes. I started this change > with a heap profiler as the primary motivation, and ended up using it to > guide this improvement. > > > Diffs > ----- > > build.gradle 64af7ae > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java > b59999c > > > Diff: https://reviews.apache.org/r/65303/diff/1/ > > > Testing > ------- > > Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at > the bottom, but here is an abridged version. It shows that task fetch > throughput universally improves by at least 2x, and heap allocation reduces > by at least the same factor. Overall GC time increases slightly as captured > here, but the stddev was anecdotally high across runs. I chose to present > this output as a caveat and a discussion point. > > If you scroll to the full output at the bottom, you will see some more > granular allocation data. Please note that the `norm` stats are normalized > for the number of operations, which i find to be the most useful measure for > validating a change. Quoting the jmh sample link above: > ```quote > It is often useful to look into non-normalized counters to see if the test is > allocation/GC-bound (figure the allocation pressure "ceiling" for your > configuration!), and normalized counters to see the more precise benchmark > behavior. > ``` > > Prior to this patch: > ```console > Benchmark (numTasks) Score Error Units > > 10000 1066.632 ± 266.924 ops/s > ·gc.alloc.rate.norm 10000 289227.205 ± 8888.051 B/op > ·gc.count 10000 24.000 counts > ·gc.time 10000 103.000 ms > > 50000 84.444 ± 32.620 ops/s > ·gc.alloc.rate.norm 50000 3831210.967 ± 840844.713 B/op > ·gc.count 50000 21.000 counts > ·gc.time 50000 1407.000 ms > > 100000 38.645 ± 20.557 ops/s > ·gc.alloc.rate.norm 100000 13555430.931 ± 6787344.701 B/op > ·gc.count 100000 52.000 counts > ·gc.time 100000 3304.000 ms > ``` > > With this patch: > ```console > Benchmark (numTasks) Score Error Units > > 10000 2851.288 ± 481.472 ops/s > ·gc.alloc.rate.norm 10000 145281.908 ± 2223.621 B/op > ·gc.count 10000 39.000 counts > ·gc.time 10000 130.000 ms > > 50000 297.380 ± 35.681 ops/s > ·gc.alloc.rate.norm 50000 1183791.866 ± 77487.278 B/op > ·gc.count 50000 25.000 counts > ·gc.time 50000 1821.000 ms > > 100000 122.211 ± 81.618 ops/s > > ·gc.alloc.rate.norm 100000 4364450.973 ± 2856586.882 B/op > ·gc.count 100000 52.000 counts > ·gc.time 100000 3698.000 ms > ``` > > > **Full benchmark output** > > Prior to this patch: > ```console > Benchmark > (numTasks) Mode Cnt Score Error Units > TaskStoreBenchmarks.MemFetchTasksBenchmark.run > 10000 thrpt 5 1066.632 ± 266.924 ops/s > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate > 10000 thrpt 5 286.647 ± 62.371 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm > 10000 thrpt 5 289227.205 ± 8888.051 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space > 10000 thrpt 5 291.263 ± 159.266 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm > 10000 thrpt 5 294277.617 ± 166069.041 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space > 10000 thrpt 5 1.218 ± 1.029 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm > 10000 thrpt 5 1220.540 ± 708.455 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count > 10000 thrpt 5 24.000 counts > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time > 10000 thrpt 5 103.000 ms > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack > 10000 thrpt NaN --- > TaskStoreBenchmarks.MemFetchTasksBenchmark.run > 50000 thrpt 5 84.444 ± 32.620 ops/s > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate > 50000 thrpt 5 267.018 ± 27.389 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm > 50000 thrpt 5 3831210.967 ± 840844.713 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space > 50000 thrpt 5 258.565 ± 149.845 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm > 50000 thrpt 5 3707563.530 ± 2262218.319 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen > 50000 thrpt 5 4.487 ± 18.053 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm > 50000 thrpt 5 63848.757 ± 264487.651 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space > 50000 thrpt 5 6.034 ± 3.651 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm > 50000 thrpt 5 87385.381 ± 75159.508 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count > 50000 thrpt 5 21.000 counts > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time > 50000 thrpt 5 1407.000 ms > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack > 50000 thrpt NaN --- > TaskStoreBenchmarks.MemFetchTasksBenchmark.run > 100000 thrpt 5 38.645 ± 20.557 ops/s > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate > 100000 thrpt 5 381.453 ± 63.491 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm > 100000 thrpt 5 13555430.931 ± 6787344.701 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space > 100000 thrpt 5 389.816 ± 123.320 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm > 100000 thrpt 5 13823571.735 ± 6642604.600 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen > 100000 thrpt 5 1.947 ± 16.766 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm > 100000 thrpt 5 92330.241 ± 794991.221 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space > 100000 thrpt 5 11.934 ± 18.565 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm > 100000 thrpt 5 414896.926 ± 551658.959 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count > 100000 thrpt 5 52.000 counts > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time > 100000 thrpt 5 3304.000 ms > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack > 100000 thrpt NaN --- > ``` > > With this patch: > ```console > Benchmark > (numTasks) Mode Cnt Score Error Units > TaskStoreBenchmarks.MemFetchTasksBenchmark.run > 10000 thrpt 5 2851.288 ± 481.472 ops/s > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate > 10000 thrpt 5 384.383 ± 58.697 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm > 10000 thrpt 5 145281.908 ± 2223.621 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space > 10000 thrpt 5 388.851 ± 114.120 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm > 10000 thrpt 5 147171.915 ± 50430.527 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space > 10000 thrpt 5 1.264 ± 0.980 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm > 10000 thrpt 5 479.848 ± 420.881 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count > 10000 thrpt 5 39.000 counts > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time > 10000 thrpt 5 130.000 ms > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack > 10000 thrpt NaN --- > TaskStoreBenchmarks.MemFetchTasksBenchmark.run > 50000 thrpt 5 297.380 ± 35.681 ops/s > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate > 50000 thrpt 5 288.839 ± 19.035 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm > 50000 thrpt 5 1183791.866 ± 77487.278 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space > 50000 thrpt 5 296.587 ± 125.148 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm > 50000 thrpt 5 1214497.578 ± 457975.153 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen > 50000 thrpt 5 6.942 ± 23.492 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm > 50000 thrpt 5 28880.733 ± 99593.659 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space > 50000 thrpt 5 6.440 ± 3.887 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm > 50000 thrpt 5 26354.762 ± 14876.857 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count > 50000 thrpt 5 25.000 counts > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time > 50000 thrpt 5 1821.000 ms > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack > 50000 thrpt NaN --- > TaskStoreBenchmarks.MemFetchTasksBenchmark.run > 100000 thrpt 5 122.211 ± 81.618 ops/s > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate > 100000 thrpt 5 377.099 ± 77.146 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm > 100000 thrpt 5 4364450.973 ± 2856586.882 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space > 100000 thrpt 5 381.570 ± 119.260 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm > 100000 thrpt 5 4415115.428 ± 3000198.792 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen > 100000 thrpt 5 1.914 ± 16.479 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Old_Gen.norm > 100000 thrpt 5 31833.830 ± 274098.881 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space > 100000 thrpt 5 12.117 ± 20.931 MB/sec > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space.norm > 100000 thrpt 5 136001.918 ± 196459.666 B/op > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.count > 100000 thrpt 5 52.000 counts > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.time > 100000 thrpt 5 3698.000 ms > TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·stack > 100000 thrpt NaN --- > ``` > > > Thanks, > > Bill Farner > >