On Sun, 11 Apr 2021 07:14:07 GMT, Tagir F. Valeev <[email protected]> wrote:
>> With the introduction of `toList()`, preserving the SIZED characteristics in
>> more cases becomes more important. This patch preserves SIZED on `skip()`
>> and `limit()` operations, so now every combination of
>> `map/mapToX/boxed/asXyzStream/skip/limit/sorted` preserves size, and
>> `toList()`, `toArray()` and `count()` may benefit from this. E. g.,
>> `LongStream.range(0, 10_000_000_000L).skip(1).count()` returns result
>> instantly with this patch.
>>
>> Some microbenchmarks added that confirm the reduced memory allocation in
>> `toList()` and `toArray()` cases. Before patch:
>>
>> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm 10000
>> thrpt 10 40235,534 ± 0,984 B/op
>> ref.SliceToList.seq_limit:·gc.alloc.rate.norm 10000
>> thrpt 10 106431,101 ± 0,198 B/op
>> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm 10000
>> thrpt 10 106544,977 ± 1,983 B/op
>> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm 10000
>> thrpt 10 40121,878 ± 0,247 B/op
>> value.SliceToArray.seq_limit:·gc.alloc.rate.norm 10000
>> thrpt 10 106317,693 ± 1,083 B/op
>> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm 10000
>> thrpt 10 106430,954 ± 0,136 B/op
>>
>>
>> After patch:
>>
>> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm 10000
>> thrpt 10 40235,648 ± 1,354 B/op
>> ref.SliceToList.seq_limit:·gc.alloc.rate.norm 10000
>> thrpt 10 40355,784 ± 1,288 B/op
>> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm 10000
>> thrpt 10 40476,032 ± 2,855 B/op
>> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm 10000
>> thrpt 10 40121,830 ± 0,308 B/op
>> value.SliceToArray.seq_limit:·gc.alloc.rate.norm 10000
>> thrpt 10 40242,554 ± 0,443 B/op
>> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm 10000
>> thrpt 10 40363,674 ± 1,576 B/op
>>
>>
>> Time improvements are less exciting. It's likely that inlining and
>> vectorizing dominate in these tests over array allocations and unnecessary
>> copying. Still, I notice a significant improvement in SliceToArray.seq_limit
>> case (2x) and mild improvement (+12..16%) in other slice tests. No
>> significant change in parallel execution time, though its performance is
>> much less stable and I didn't run enough tests.
>>
>> Before patch:
>>
>> Benchmark (size) Mode Cnt Score Error
>> Units
>> ref.SliceToList.par_baseline 10000 thrpt 30 14876,723 ± 99,770
>> ops/s
>> ref.SliceToList.par_limit 10000 thrpt 30 14856,841 ± 215,089
>> ops/s
>> ref.SliceToList.par_skipLimit 10000 thrpt 30 9555,818 ± 991,335
>> ops/s
>> ref.SliceToList.seq_baseline 10000 thrpt 30 23732,290 ± 444,162
>> ops/s
>> ref.SliceToList.seq_limit 10000 thrpt 30 14894,040 ± 176,496
>> ops/s
>> ref.SliceToList.seq_skipLimit 10000 thrpt 30 10646,929 ± 36,469
>> ops/s
>> value.SliceToArray.par_baseline 10000 thrpt 30 25093,141 ± 376,402
>> ops/s
>> value.SliceToArray.par_limit 10000 thrpt 30 24798,889 ± 760,762
>> ops/s
>> value.SliceToArray.par_skipLimit 10000 thrpt 30 16456,310 ± 926,882
>> ops/s
>> value.SliceToArray.seq_baseline 10000 thrpt 30 69669,787 ± 494,562
>> ops/s
>> value.SliceToArray.seq_limit 10000 thrpt 30 21097,081 ± 117,338
>> ops/s
>> value.SliceToArray.seq_skipLimit 10000 thrpt 30 15522,871 ± 112,557
>> ops/s
>>
>>
>> After patch:
>>
>> Benchmark (size) Mode Cnt Score Error
>> Units
>> ref.SliceToList.par_baseline 10000 thrpt 30 14793,373 ± 64,905
>> ops/s
>> ref.SliceToList.par_limit 10000 thrpt 30 13301,024 ± 1300,431
>> ops/s
>> ref.SliceToList.par_skipLimit 10000 thrpt 30 11131,698 ± 1769,932
>> ops/s
>> ref.SliceToList.seq_baseline 10000 thrpt 30 24101,048 ± 263,528
>> ops/s
>> ref.SliceToList.seq_limit 10000 thrpt 30 16872,168 ± 76,696
>> ops/s
>> ref.SliceToList.seq_skipLimit 10000 thrpt 30 11953,253 ± 105,231
>> ops/s
>> value.SliceToArray.par_baseline 10000 thrpt 30 25442,442 ± 455,554
>> ops/s
>> value.SliceToArray.par_limit 10000 thrpt 30 23111,730 ± 2246,086
>> ops/s
>> value.SliceToArray.par_skipLimit 10000 thrpt 30 17980,750 ± 2329,077
>> ops/s
>> value.SliceToArray.seq_baseline 10000 thrpt 30 66512,898 ± 1001,042
>> ops/s
>> value.SliceToArray.seq_limit 10000 thrpt 30 41792,549 ± 1085,547
>> ops/s
>> value.SliceToArray.seq_skipLimit 10000 thrpt 30 18007,613 ± 141,716
>> ops/s
>>
>>
>> I also modernized SliceOps a little bit, using switch expression (with no
>> explicit default!) and diamonds on anonymous classes.
>
> Tagir F. Valeev has updated the pull request incrementally with one
> additional commit since the last revision:
>
> Fixes according to review:
>
> 1. Comments in adjustSize
> 2. repeating code extracted from testNoEvaluationForSizedStream
Even though there are not many changes this cuts deep into how streams work.
I suspect there is some, possibly minor, impact for sized streams without
limit/skip because of the increased cost to compute the exact size. Further,
that cost is also incurred for `AbstractWrappingSpliterator`, where we might
need to cache the exact size result.
I made some suggestions, mostly around naming, but I admit to being a little
uncomfortable with the proposed change and need to think a little more about
it. I am wondering if we need another stream flag indicating the size is known
and has to be computed?
src/java.base/share/classes/java/util/stream/AbstractPipeline.java line 481:
> 479: long adjustSize(long size) {
> 480: return previousStage == null ? size :
> previousStage.adjustSize(size);
> 481: }
Suggestion:
/**
* Returns the exact output size of the pipeline given the exact size
reported by the source spliterator.
*
* @param sourceSize the exact size reported by the source spliterator
* @return the exact output size
*/
long exactOutputSize(long sourceSize) {
return previousStage == null ? sourceSize :
previousStage.exactOutputSize(sourceSize);
}
Sticking with the `exactOutput` name i think complements the two methods.
src/java.base/share/classes/java/util/stream/SliceOps.java line 232:
> 230: if (skip < 0)
> 231: throw new IllegalArgumentException("Skip must be
> non-negative: " + skip);
> 232: long adjustedLimit = limit >= 0 ? limit : Long.MAX_VALUE;
Suggestion:
long normalizedLimit = limit >= 0 ? limit : Long.MAX_VALUE;
The name `adjustedLimit` implies the limit value is changed, rather than
changing the no-limit value.
-------------
Changes requested by psandoz (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/3427