Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-31 Thread Jordan Ly

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196592
---


Ship it!




Ship It!

- Jordan Ly


On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 31, 2018, 6:12 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 
> 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here 
> is an abridged version.  It shows that task fetch throughput universally 
> improves by ~2x (mod error margins), and heap allocation reduces by at least 
> the same factor.  Overall GC time increases slightly as captured here, but 
> the stddev was anecdotally high across runs.  I chose to present this output 
> as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1   481.529 ± 
> 184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1334970.771 ±   
> 33544.960B/op
> 
> FetchAll.run  578.652 ±  
> 20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   3991107.524 ±  
> 701585.657B/op
> 
> FetchAll.run 1038.371 ±  
> 11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10  13487028.139 ± 
> 3369614.510B/op
> 
> IndexedFetchAndFilter.run 1   296.557 ± 
> 198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1655319.005 ±   
> 98138.360B/op
> 
> IndexedFetchAndFilter.run 550.300 ±   
> 5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 5   6671548.381 ±  
> 452020.849B/op
> 
> IndexedFetchAndFilter.run1017.637 ±   
> 3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm10  28100173.458 ± 
> 4486308.188B/op
> ```
> 
> With this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1  1653.572 ± 
> 799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1155426.052 ±   
> 10345.657B/op
> 
> FetchAll.run  5   210.454 ±  
> 54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   1457560.505 ±  
> 228631.547B/op
> 
> FetchAll.run 1097.783 ±  
> 42.130   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10   5096464.582 ± 
> 1792136.191B/op
> 
> IndexedFetchAndFilter.run 1   500.740 ± 
> 210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1370760.068 ±   
> 36813.071B/op
> 
> IndexedFetchAndFilter.run 

Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-31 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196583
---


Ship it!




Ship It!

- Stephan Erb


On Jan. 31, 2018, 7:12 nachm., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 31, 2018, 7:12 nachm.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 
> 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here 
> is an abridged version.  It shows that task fetch throughput universally 
> improves by ~2x (mod error margins), and heap allocation reduces by at least 
> the same factor.  Overall GC time increases slightly as captured here, but 
> the stddev was anecdotally high across runs.  I chose to present this output 
> as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1   481.529 ± 
> 184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1334970.771 ±   
> 33544.960B/op
> 
> FetchAll.run  578.652 ±  
> 20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   3991107.524 ±  
> 701585.657B/op
> 
> FetchAll.run 1038.371 ±  
> 11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10  13487028.139 ± 
> 3369614.510B/op
> 
> IndexedFetchAndFilter.run 1   296.557 ± 
> 198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1655319.005 ±   
> 98138.360B/op
> 
> IndexedFetchAndFilter.run 550.300 ±   
> 5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 5   6671548.381 ±  
> 452020.849B/op
> 
> IndexedFetchAndFilter.run1017.637 ±   
> 3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm10  28100173.458 ± 
> 4486308.188B/op
> ```
> 
> With this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1  1653.572 ± 
> 799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1155426.052 ±   
> 10345.657B/op
> 
> FetchAll.run  5   210.454 ±  
> 54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   1457560.505 ±  
> 228631.547B/op
> 
> FetchAll.run 1097.783 ±  
> 42.130   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10   5096464.582 ± 
> 1792136.191B/op
> 
> IndexedFetchAndFilter.run 1   500.740 ± 
> 210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1370760.068 ±   
> 36813.071B/op
> 
> IndexedFetchAndFilter.run   

Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-31 Thread David McLaughlin

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196581
---


Ship it!




Ship It!

- David McLaughlin


On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 31, 2018, 6:12 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 
> 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here 
> is an abridged version.  It shows that task fetch throughput universally 
> improves by ~2x (mod error margins), and heap allocation reduces by at least 
> the same factor.  Overall GC time increases slightly as captured here, but 
> the stddev was anecdotally high across runs.  I chose to present this output 
> as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1   481.529 ± 
> 184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1334970.771 ±   
> 33544.960B/op
> 
> FetchAll.run  578.652 ±  
> 20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   3991107.524 ±  
> 701585.657B/op
> 
> FetchAll.run 1038.371 ±  
> 11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10  13487028.139 ± 
> 3369614.510B/op
> 
> IndexedFetchAndFilter.run 1   296.557 ± 
> 198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1655319.005 ±   
> 98138.360B/op
> 
> IndexedFetchAndFilter.run 550.300 ±   
> 5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 5   6671548.381 ±  
> 452020.849B/op
> 
> IndexedFetchAndFilter.run1017.637 ±   
> 3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm10  28100173.458 ± 
> 4486308.188B/op
> ```
> 
> With this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1  1653.572 ± 
> 799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1155426.052 ±   
> 10345.657B/op
> 
> FetchAll.run  5   210.454 ±  
> 54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   1457560.505 ±  
> 228631.547B/op
> 
> FetchAll.run 1097.783 ±  
> 42.130   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10   5096464.582 ± 
> 1792136.191B/op
> 
> IndexedFetchAndFilter.run 1   500.740 ± 
> 210.675   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1370760.068 ±   
> 36813.071B/op
> 
> IndexedFetchAndFilter.run  

Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-31 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196579
---



Master (787ccfe) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Jan. 31, 2018, 6:12 p.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 31, 2018, 6:12 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
>   src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 
> 9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5ca9a5185e240ad729fefc6638476a4aecc 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/2/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here 
> is an abridged version.  It shows that task fetch throughput universally 
> improves by ~2x (mod error margins), and heap allocation reduces by at least 
> the same factor.  Overall GC time increases slightly as captured here, but 
> the stddev was anecdotally high across runs.  I chose to present this output 
> as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1   481.529 ± 
> 184.751   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1334970.771 ±   
> 33544.960B/op
> 
> FetchAll.run  578.652 ±  
> 20.869   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   3991107.524 ±  
> 701585.657B/op
> 
> FetchAll.run 1038.371 ±  
> 11.710   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10  13487028.139 ± 
> 3369614.510B/op
> 
> IndexedFetchAndFilter.run 1   296.557 ± 
> 198.389   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1655319.005 ±   
> 98138.360B/op
> 
> IndexedFetchAndFilter.run 550.300 ±   
> 5.818   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm 5   6671548.381 ±  
> 452020.849B/op
> 
> IndexedFetchAndFilter.run1017.637 ±   
> 3.739   ops/s
> IndexedFetchAndFilter.run:·gc.alloc.rate.norm10  28100173.458 ± 
> 4486308.188B/op
> ```
> 
> With this patch:
> ```console
> Benchmark(numTasks) Score 
> Error   Units
> FetchAll.run  1  1653.572 ± 
> 799.123   ops/s
> FetchAll.run:·gc.alloc.rate.norm  1155426.052 ±   
> 10345.657B/op
> 
> FetchAll.run  5   210.454 ±  
> 54.340   ops/s
> FetchAll.run:·gc.alloc.rate.norm  5   1457560.505 ±  
> 228631.547B/op
> 
> FetchAll.run 1097.783 ±  
> 42.130   ops/s
> FetchAll.run:·gc.alloc.rate.norm 10   5096464.582 ± 
> 1792136.191B/op
> 
> IndexedFetchAndFilter.run 

Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-31 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
---

(Updated Jan. 31, 2018, 10:12 a.m.)


Review request for Aurora and Jordan Ly.


Changes
---

Applied Stephan's suggestion, added a benchmark to validate.


Repository: aurora


Description
---

Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style 
rather than functional.  I arrived at this result after running benchmarks with 
some of the other usual suspects (`ArrayList`, `LinkedList`).

This patch also enables stack and heap profilers in jmh (more details 
[here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
 providing insight into the heap impact of changes.  I started this change with 
a heap profiler as the primary motivation, and ended up using it to guide this 
improvement.


Diffs (updated)
-

  build.gradle 64af7aefbe784d95df28f59606a0d17afb57c3a1 
  src/jmh/java/org/apache/aurora/benchmark/TaskStoreBenchmarks.java 
9ec9865ae9a60fa2ab81832a2cf886b7b6b887cd 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
b5ca9a5185e240ad729fefc6638476a4aecc 


Diff: https://reviews.apache.org/r/65303/diff/2/

Changes: https://reviews.apache.org/r/65303/diff/1-2/


Testing (updated)
---

Full benchmark summary for `TaskStoreBenchmarks` is at the bottom, but here is 
an abridged version.  It shows that task fetch throughput universally improves 
by ~2x (mod error margins), and heap allocation reduces by at least the same 
factor.  Overall GC time increases slightly as captured here, but the stddev 
was anecdotally high across runs.  I chose to present this output as a caveat 
and a discussion point.

If you scroll to the full output at the bottom, you will see some more granular 
allocation data.  Please note that the `norm` stats are normalized for the 
number of operations, which i find to be the most useful measure for validating 
a change.  Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is 
allocation/GC-bound (figure the allocation pressure "ceiling" for your 
configuration!), and normalized counters to see the more precise benchmark 
behavior.
```

Prior to this patch:
```console
Benchmark(numTasks) Score 
Error   Units
FetchAll.run  1   481.529 ± 
184.751   ops/s
FetchAll.run:·gc.alloc.rate.norm  1334970.771 ±   
33544.960B/op

FetchAll.run  578.652 ±  
20.869   ops/s
FetchAll.run:·gc.alloc.rate.norm  5   3991107.524 ±  
701585.657B/op

FetchAll.run 1038.371 ±  
11.710   ops/s
FetchAll.run:·gc.alloc.rate.norm 10  13487028.139 ± 
3369614.510B/op

IndexedFetchAndFilter.run 1   296.557 ± 
198.389   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1655319.005 ±   
98138.360B/op

IndexedFetchAndFilter.run 550.300 ±   
5.818   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 5   6671548.381 ±  
452020.849B/op

IndexedFetchAndFilter.run1017.637 ±   
3.739   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm10  28100173.458 ± 
4486308.188B/op
```

With this patch:
```console
Benchmark(numTasks) Score 
Error   Units
FetchAll.run  1  1653.572 ± 
799.123   ops/s
FetchAll.run:·gc.alloc.rate.norm  1155426.052 ±   
10345.657B/op

FetchAll.run  5   210.454 ±  
54.340   ops/s
FetchAll.run:·gc.alloc.rate.norm  5   1457560.505 ±  
228631.547B/op

FetchAll.run 1097.783 ±  
42.130   ops/s
FetchAll.run:·gc.alloc.rate.norm 10   5096464.582 ± 
1792136.191B/op

IndexedFetchAndFilter.run 1   500.740 ± 
210.675   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 1370760.068 ±   
36813.071B/op

IndexedFetchAndFilter.run 595.316 ±  
23.084   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm 5   3389472.432 ±  
550602.162B/op

IndexedFetchAndFilter.run1041.572 ±  
26.747   ops/s
IndexedFetchAndFilter.run:·gc.alloc.rate.norm10  12324183.188 ± 
7537788.165B/op
```


**Full benchmark output**

Prior to this patch:
```console
Ben

Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-24 Thread Bill Farner


> On Jan. 24, 2018, 3:40 a.m., Stephan Erb wrote:
> > src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
> > Line 234 (original), 235 (patched)
> > 
> >
> > Have you considered passing in the predicate filter in here? For index 
> > scans this should help to eliminate a large amount of allocations.

A fine idea!  I will be out of contact for a few days, but will try this out 
when i get back.


- Bill


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196107
---


On Jan. 23, 2018, 4:32 p.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 23, 2018, 4:32 p.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7ae 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5c 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/1/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at 
> the bottom, but here is an abridged version.  It shows that task fetch 
> throughput universally improves by at least 2x, and heap allocation reduces 
> by at least the same factor.  Overall GC time increases slightly as captured 
> here, but the stddev was anecdotally high across runs.  I chose to present 
> this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark (numTasks)Score Error   Units
> 
>   1  1066.632 ± 266.924   ops/s
> ·gc.alloc.rate.norm   1289227.205 ±.051B/op
> ·gc.count 124.000counts
> ·gc.time  1   103.000ms
> 
>   584.444 ±  32.620   ops/s
> ·gc.alloc.rate.norm   5   3831210.967 ±  840844.713B/op
> ·gc.count 521.000counts
> ·gc.time  5  1407.000ms
> 
>  1038.645 ±  20.557   ops/s
> ·gc.alloc.rate.norm  10  13555430.931 ± 6787344.701B/op
> ·gc.count1052.000counts
> ·gc.time 10  3304.000ms
> ```
> 
> With this patch:
> ```console
> Benchmark   (numTasks)   Score Error   Units
> 
>  12851.288 ± 481.472   ops/s
> ·gc.alloc.rate.norm  1  145281.908 ±2223.621B/op
> ·gc.count1  39.000counts
> ·gc.time 1 130.000ms
> 
>  5 297.380 ±  35.681   ops/s
> ·gc.alloc.rate.norm  5 1183791.866 ±   77487.278B/op
> ·gc.count5  25.000counts
> ·gc.time 51821.000ms
> 
> 10 122.211 ±  81.618   ops/s  
>   
> ·gc.alloc.rate.norm 10 4364450.973 ± 2856586.882B/op
> ·gc.count   10  52.000counts
> ·gc.time103698.000ms
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> `

Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-24 Thread Stephan Erb

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196107
---




src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java
Line 234 (original), 235 (patched)


Have you considered passing in the predicate filter in here? For index 
scans this should help to eliminate a large amount of allocations.


- Stephan Erb


On Jan. 24, 2018, 1:32 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 24, 2018, 1:32 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7ae 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5c 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/1/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at 
> the bottom, but here is an abridged version.  It shows that task fetch 
> throughput universally improves by at least 2x, and heap allocation reduces 
> by at least the same factor.  Overall GC time increases slightly as captured 
> here, but the stddev was anecdotally high across runs.  I chose to present 
> this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark (numTasks)Score Error   Units
> 
>   1  1066.632 ± 266.924   ops/s
> ·gc.alloc.rate.norm   1289227.205 ±.051B/op
> ·gc.count 124.000counts
> ·gc.time  1   103.000ms
> 
>   584.444 ±  32.620   ops/s
> ·gc.alloc.rate.norm   5   3831210.967 ±  840844.713B/op
> ·gc.count 521.000counts
> ·gc.time  5  1407.000ms
> 
>  1038.645 ±  20.557   ops/s
> ·gc.alloc.rate.norm  10  13555430.931 ± 6787344.701B/op
> ·gc.count1052.000counts
> ·gc.time 10  3304.000ms
> ```
> 
> With this patch:
> ```console
> Benchmark   (numTasks)   Score Error   Units
> 
>  12851.288 ± 481.472   ops/s
> ·gc.alloc.rate.norm  1  145281.908 ±2223.621B/op
> ·gc.count1  39.000counts
> ·gc.time 1 130.000ms
> 
>  5 297.380 ±  35.681   ops/s
> ·gc.alloc.rate.norm  5 1183791.866 ±   77487.278B/op
> ·gc.count5  25.000counts
> ·gc.time 51821.000ms
> 
> 10 122.211 ±  81.618   ops/s  
>   
> ·gc.alloc.rate.norm 10 4364450.973 ± 2856586.882B/op
> ·gc.count   10  52.000counts
> ·gc.time103698.000ms
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark 
>(numTasks)   Mode  Cnt Score Error   Units
> TaskStoreBenchmarks.MemFetchT

Re: Review Request 65303: Improve performance of MemTaskStore queries

2018-01-23 Thread Aurora ReviewBot

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/#review196085
---



Master (dbe7137) is green with this patch.
  ./build-support/jenkins/build.sh

However, it appears that it might lack test coverage.

I will refresh this build result if you post a review containing "@ReviewBot 
retry"

- Aurora ReviewBot


On Jan. 24, 2018, 12:32 a.m., Bill Farner wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65303/
> ---
> 
> (Updated Jan. 24, 2018, 12:32 a.m.)
> 
> 
> Review request for Aurora and Jordan Ly.
> 
> 
> Repository: aurora
> 
> 
> Description
> ---
> 
> Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative 
> style rather than functional.  I arrived at this result after running 
> benchmarks with some of the other usual suspects (`ArrayList`, `LinkedList`).
> 
> This patch also enables stack and heap profilers in jmh (more details 
> [here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
>  providing insight into the heap impact of changes.  I started this change 
> with a heap profiler as the primary motivation, and ended up using it to 
> guide this improvement.
> 
> 
> Diffs
> -
> 
>   build.gradle 64af7ae 
>   src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
> b5c 
> 
> 
> Diff: https://reviews.apache.org/r/65303/diff/1/
> 
> 
> Testing
> ---
> 
> Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at 
> the bottom, but here is an abridged version.  It shows that task fetch 
> throughput universally improves by at least 2x, and heap allocation reduces 
> by at least the same factor.  Overall GC time increases slightly as captured 
> here, but the stddev was anecdotally high across runs.  I chose to present 
> this output as a caveat and a discussion point.
> 
> If you scroll to the full output at the bottom, you will see some more 
> granular allocation data.  Please note that the `norm` stats are normalized 
> for the number of operations, which i find to be the most useful measure for 
> validating a change.  Quoting the jmh sample link above:
> ```quote
> It is often useful to look into non-normalized counters to see if the test is 
> allocation/GC-bound (figure the allocation pressure "ceiling" for your 
> configuration!), and normalized counters to see the more precise benchmark 
> behavior.
> ```
> 
> Prior to this patch:
> ```console
> Benchmark (numTasks)Score Error   Units
> 
>   1  1066.632 ± 266.924   ops/s
> ·gc.alloc.rate.norm   1289227.205 ±.051B/op
> ·gc.count 124.000counts
> ·gc.time  1   103.000ms
> 
>   584.444 ±  32.620   ops/s
> ·gc.alloc.rate.norm   5   3831210.967 ±  840844.713B/op
> ·gc.count 521.000counts
> ·gc.time  5  1407.000ms
> 
>  1038.645 ±  20.557   ops/s
> ·gc.alloc.rate.norm  10  13555430.931 ± 6787344.701B/op
> ·gc.count1052.000counts
> ·gc.time 10  3304.000ms
> ```
> 
> With this patch:
> ```console
> Benchmark   (numTasks)   Score Error   Units
> 
>  12851.288 ± 481.472   ops/s
> ·gc.alloc.rate.norm  1  145281.908 ±2223.621B/op
> ·gc.count1  39.000counts
> ·gc.time 1 130.000ms
> 
>  5 297.380 ±  35.681   ops/s
> ·gc.alloc.rate.norm  5 1183791.866 ±   77487.278B/op
> ·gc.count5  25.000counts
> ·gc.time 51821.000ms
> 
> 10 122.211 ±  81.618   ops/s  
>   
> ·gc.alloc.rate.norm 10 4364450.973 ± 2856586.882B/op
> ·gc.count   10  52.000counts
> ·gc.time103698.000ms
> ```
> 
> 
> **Full benchmark output**
> 
> Prior to this patch:
> ```console
> Benchmark 
>(numTasks)   Mode  Cnt Score Error   Units
> TaskStoreBenchmarks.MemFetchTasksBenchmark.run
> 1  thrpt5  

Review Request 65303: Improve performance of MemTaskStore queries

2018-01-23 Thread Bill Farner

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65303/
---

Review request for Aurora and Jordan Ly.


Repository: aurora


Description
---

Use `ArrayDeque` rather than `HashSet` for fetchTasks, and use imperative style 
rather than functional.  I arrived at this result after running benchmarks with 
some of the other usual suspects (`ArrayList`, `LinkedList`).

This patch also enables stack and heap profilers in jmh (more details 
[here](http://hg.openjdk.java.net/codetools/jmh/file/25d8b2695bac/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_35_Profilers.java)),
 providing insight into the heap impact of changes.  I started this change with 
a heap profiler as the primary motivation, and ended up using it to guide this 
improvement.


Diffs
-

  build.gradle 64af7ae 
  src/main/java/org/apache/aurora/scheduler/storage/mem/MemTaskStore.java 
b5c 


Diff: https://reviews.apache.org/r/65303/diff/1/


Testing
---

Full benchmark summary for `TaskStoreBenchmarks.MemFetchTasksBenchmark` is at 
the bottom, but here is an abridged version.  It shows that task fetch 
throughput universally improves by at least 2x, and heap allocation reduces by 
at least the same factor.  Overall GC time increases slightly as captured here, 
but the stddev was anecdotally high across runs.  I chose to present this 
output as a caveat and a discussion point.

If you scroll to the full output at the bottom, you will see some more granular 
allocation data.  Please note that the `norm` stats are normalized for the 
number of operations, which i find to be the most useful measure for validating 
a change.  Quoting the jmh sample link above:
```quote
It is often useful to look into non-normalized counters to see if the test is 
allocation/GC-bound (figure the allocation pressure "ceiling" for your 
configuration!), and normalized counters to see the more precise benchmark 
behavior.
```

Prior to this patch:
```console
Benchmark (numTasks)Score Error   Units

  1  1066.632 ± 266.924   ops/s
·gc.alloc.rate.norm   1289227.205 ±.051B/op
·gc.count 124.000counts
·gc.time  1   103.000ms

  584.444 ±  32.620   ops/s
·gc.alloc.rate.norm   5   3831210.967 ±  840844.713B/op
·gc.count 521.000counts
·gc.time  5  1407.000ms

 1038.645 ±  20.557   ops/s
·gc.alloc.rate.norm  10  13555430.931 ± 6787344.701B/op
·gc.count1052.000counts
·gc.time 10  3304.000ms
```

With this patch:
```console
Benchmark   (numTasks)   Score Error   Units

 12851.288 ± 481.472   ops/s
·gc.alloc.rate.norm  1  145281.908 ±2223.621B/op
·gc.count1  39.000counts
·gc.time 1 130.000ms

 5 297.380 ±  35.681   ops/s
·gc.alloc.rate.norm  5 1183791.866 ±   77487.278B/op
·gc.count5  25.000counts
·gc.time 51821.000ms

10 122.211 ±  81.618   ops/s

·gc.alloc.rate.norm 10 4364450.973 ± 2856586.882B/op
·gc.count   10  52.000counts
·gc.time103698.000ms
```


**Full benchmark output**

Prior to this patch:
```console
Benchmark   
 (numTasks)   Mode  Cnt Score Error   Units
TaskStoreBenchmarks.MemFetchTasksBenchmark.run  
  1  thrpt5  1066.632 ± 266.924   ops/s
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate   
  1  thrpt5   286.647 ±  62.371  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.alloc.rate.norm  
  1  thrpt5289227.205 ±.051B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space  
  1  thrpt5   291.263 ± 159.266  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Eden_Space.norm 
  1  thrpt5294277.617 ±  166069.041B/op
TaskStoreBenchmarks.MemFetchTasksBenchmark.run:·gc.churn.PS_Survivor_Space  
  1  thrpt5 1.218 ±   1.029  MB/sec
TaskStoreBenchmarks.MemFetchTasksBenchmark.