> With the introduction of `toList()`, preserving the SIZED characteristics in > more cases becomes more important. This patch preserves SIZED on `skip()` and > `limit()` operations, so now every combination of > `map/mapToX/boxed/asXyzStream/skip/limit/sorted` preserves size, and > `toList()`, `toArray()` and `count()` may benefit from this. E. g., > `LongStream.range(0, 10_000_000_000L).skip(1).count()` returns result > instantly with this patch. > > Some microbenchmarks added that confirm the reduced memory allocation in > `toList()` and `toArray()` cases. Before patch: > ref.SliceToList.seq_baseline:·gc.alloc.rate.norm 10000 > thrpt 10 40235,534 ± 0,984 B/op > ref.SliceToList.seq_limit:·gc.alloc.rate.norm 10000 > thrpt 10 106431,101 ± 0,198 B/op > ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm 10000 > thrpt 10 106544,977 ± 1,983 B/op > value.SliceToArray.seq_baseline:·gc.alloc.rate.norm 10000 > thrpt 10 40121,878 ± 0,247 B/op > value.SliceToArray.seq_limit:·gc.alloc.rate.norm 10000 > thrpt 10 106317,693 ± 1,083 B/op > value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm 10000 > thrpt 10 106430,954 ± 0,136 B/op > > After patch: > ref.SliceToList.seq_baseline:·gc.alloc.rate.norm 10000 > thrpt 10 40235,648 ± 1,354 B/op > ref.SliceToList.seq_limit:·gc.alloc.rate.norm 10000 > thrpt 10 40355,784 ± 1,288 B/op > ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm 10000 > thrpt 10 40476,032 ± 2,855 B/op > value.SliceToArray.seq_baseline:·gc.alloc.rate.norm 10000 > thrpt 10 40121,830 ± 0,308 B/op > value.SliceToArray.seq_limit:·gc.alloc.rate.norm 10000 > thrpt 10 40242,554 ± 0,443 B/op > value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm 10000 > thrpt 10 40363,674 ± 1,576 B/op > > Time improvements are less exciting. It's likely that inlining and > vectorizing dominate in these tests over array allocations and unnecessary > copying. Still, I notice a significant improvement in SliceToArray.seq_limit > case (2x) and mild improvement (+12..16%) in other slice tests. No > significant change in parallel execution time, though its performance is much > less stable and I didn't run enough tests. > > Before patch: > Benchmark (size) Mode Cnt Score Error > Units > ref.SliceToList.par_baseline 10000 thrpt 30 14876,723 ± 99,770 > ops/s > ref.SliceToList.par_limit 10000 thrpt 30 14856,841 ± 215,089 > ops/s > ref.SliceToList.par_skipLimit 10000 thrpt 30 9555,818 ± 991,335 > ops/s > ref.SliceToList.seq_baseline 10000 thrpt 30 23732,290 ± 444,162 > ops/s > ref.SliceToList.seq_limit 10000 thrpt 30 14894,040 ± 176,496 > ops/s > ref.SliceToList.seq_skipLimit 10000 thrpt 30 10646,929 ± 36,469 > ops/s > value.SliceToArray.par_baseline 10000 thrpt 30 25093,141 ± 376,402 > ops/s > value.SliceToArray.par_limit 10000 thrpt 30 24798,889 ± 760,762 > ops/s > value.SliceToArray.par_skipLimit 10000 thrpt 30 16456,310 ± 926,882 > ops/s > value.SliceToArray.seq_baseline 10000 thrpt 30 69669,787 ± 494,562 > ops/s > value.SliceToArray.seq_limit 10000 thrpt 30 21097,081 ± 117,338 > ops/s > value.SliceToArray.seq_skipLimit 10000 thrpt 30 15522,871 ± 112,557 > ops/s > > After patch: > Benchmark (size) Mode Cnt Score Error > Units > ref.SliceToList.par_baseline 10000 thrpt 30 14793,373 ± 64,905 > ops/s > ref.SliceToList.par_limit 10000 thrpt 30 13301,024 ± 1300,431 > ops/s > ref.SliceToList.par_skipLimit 10000 thrpt 30 11131,698 ± 1769,932 > ops/s > ref.SliceToList.seq_baseline 10000 thrpt 30 24101,048 ± 263,528 > ops/s > ref.SliceToList.seq_limit 10000 thrpt 30 16872,168 ± 76,696 > ops/s > ref.SliceToList.seq_skipLimit 10000 thrpt 30 11953,253 ± 105,231 > ops/s > value.SliceToArray.par_baseline 10000 thrpt 30 25442,442 ± 455,554 > ops/s > value.SliceToArray.par_limit 10000 thrpt 30 23111,730 ± 2246,086 > ops/s > value.SliceToArray.par_skipLimit 10000 thrpt 30 17980,750 ± 2329,077 > ops/s > value.SliceToArray.seq_baseline 10000 thrpt 30 66512,898 ± 1001,042 > ops/s > value.SliceToArray.seq_limit 10000 thrpt 30 41792,549 ± 1085,547 > ops/s > value.SliceToArray.seq_skipLimit 10000 thrpt 30 18007,613 ± 141,716 > ops/s > > I also modernized SliceOps a little bit, using switch expression (with no > explicit default!) and diamonds on anonymous classes.
Tagir F. Valeev has updated the pull request incrementally with one additional commit since the last revision: Fixes according to review: 1. Comments in adjustSize 2. repeating code extracted from testNoEvaluationForSizedStream ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/3427/files - new: https://git.openjdk.java.net/jdk/pull/3427/files/036ea4f9..25755e14 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3427&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3427&range=00-01 Stats: 73 lines in 2 files changed: 6 ins; 23 del; 44 mod Patch: https://git.openjdk.java.net/jdk/pull/3427.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/3427/head:pull/3427 PR: https://git.openjdk.java.net/jdk/pull/3427