Re: RFR: 8265029: Preserve SIZED characteristics on slice operations (skip, limit) [v2]

Paul Sandoz Mon, 12 Apr 2021 16:22:32 -0700

On Sun, 11 Apr 2021 07:14:07 GMT, Tagir F. Valeev <[email protected]> wrote:


>> With the introduction of `toList()`, preserving the SIZED characteristics in 
>> more cases becomes more important. This patch preserves SIZED on `skip()` 
>> and `limit()` operations, so now every combination of 
>> `map/mapToX/boxed/asXyzStream/skip/limit/sorted` preserves size, and 
>> `toList()`, `toArray()` and `count()` may benefit from this. E. g., 
>> `LongStream.range(0, 10_000_000_000L).skip(1).count()` returns result 
>> instantly with this patch.
>> 
>> Some microbenchmarks added that confirm the reduced memory allocation in 
>> `toList()` and `toArray()` cases. Before patch:
>> 
>> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm                    10000  
>> thrpt   10   40235,534 ±     0,984    B/op
>> ref.SliceToList.seq_limit:·gc.alloc.rate.norm                       10000  
>> thrpt   10  106431,101 ±     0,198    B/op
>> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm                   10000  
>> thrpt   10  106544,977 ±     1,983    B/op
>> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm                 10000  
>> thrpt   10   40121,878 ±     0,247    B/op
>> value.SliceToArray.seq_limit:·gc.alloc.rate.norm                    10000  
>> thrpt   10  106317,693 ±     1,083    B/op
>> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm                10000  
>> thrpt   10  106430,954 ±     0,136    B/op
>> 
>> 
>> After patch:
>> 
>> ref.SliceToList.seq_baseline:·gc.alloc.rate.norm                    10000  
>> thrpt   10  40235,648 ±     1,354    B/op
>> ref.SliceToList.seq_limit:·gc.alloc.rate.norm                       10000  
>> thrpt   10  40355,784 ±     1,288    B/op
>> ref.SliceToList.seq_skipLimit:·gc.alloc.rate.norm                   10000  
>> thrpt   10  40476,032 ±     2,855    B/op
>> value.SliceToArray.seq_baseline:·gc.alloc.rate.norm                 10000  
>> thrpt   10  40121,830 ±     0,308    B/op
>> value.SliceToArray.seq_limit:·gc.alloc.rate.norm                    10000  
>> thrpt   10  40242,554 ±     0,443    B/op
>> value.SliceToArray.seq_skipLimit:·gc.alloc.rate.norm                10000  
>> thrpt   10  40363,674 ±     1,576    B/op
>> 
>> 
>> Time improvements are less exciting. It's likely that inlining and 
>> vectorizing dominate in these tests over array allocations and unnecessary 
>> copying. Still, I notice a significant improvement in SliceToArray.seq_limit 
>> case (2x) and mild improvement (+12..16%) in other slice tests. No 
>> significant change in parallel execution time, though its performance is 
>> much less stable and I didn't run enough tests.
>> 
>> Before patch:
>> 
>> Benchmark                         (size)   Mode  Cnt      Score     Error  
>> Units
>> ref.SliceToList.par_baseline       10000  thrpt   30  14876,723 ±  99,770  
>> ops/s
>> ref.SliceToList.par_limit          10000  thrpt   30  14856,841 ± 215,089  
>> ops/s
>> ref.SliceToList.par_skipLimit      10000  thrpt   30   9555,818 ± 991,335  
>> ops/s
>> ref.SliceToList.seq_baseline       10000  thrpt   30  23732,290 ± 444,162  
>> ops/s
>> ref.SliceToList.seq_limit          10000  thrpt   30  14894,040 ± 176,496  
>> ops/s
>> ref.SliceToList.seq_skipLimit      10000  thrpt   30  10646,929 ±  36,469  
>> ops/s
>> value.SliceToArray.par_baseline    10000  thrpt   30  25093,141 ± 376,402  
>> ops/s
>> value.SliceToArray.par_limit       10000  thrpt   30  24798,889 ± 760,762  
>> ops/s
>> value.SliceToArray.par_skipLimit   10000  thrpt   30  16456,310 ± 926,882  
>> ops/s
>> value.SliceToArray.seq_baseline    10000  thrpt   30  69669,787 ± 494,562  
>> ops/s
>> value.SliceToArray.seq_limit       10000  thrpt   30  21097,081 ± 117,338  
>> ops/s
>> value.SliceToArray.seq_skipLimit   10000  thrpt   30  15522,871 ± 112,557  
>> ops/s
>> 
>> 
>> After patch:
>> 
>> Benchmark                         (size)   Mode  Cnt      Score      Error  
>> Units
>> ref.SliceToList.par_baseline       10000  thrpt   30  14793,373 ±   64,905  
>> ops/s
>> ref.SliceToList.par_limit          10000  thrpt   30  13301,024 ± 1300,431  
>> ops/s
>> ref.SliceToList.par_skipLimit      10000  thrpt   30  11131,698 ± 1769,932  
>> ops/s
>> ref.SliceToList.seq_baseline       10000  thrpt   30  24101,048 ±  263,528  
>> ops/s
>> ref.SliceToList.seq_limit          10000  thrpt   30  16872,168 ±   76,696  
>> ops/s
>> ref.SliceToList.seq_skipLimit      10000  thrpt   30  11953,253 ±  105,231  
>> ops/s
>> value.SliceToArray.par_baseline    10000  thrpt   30  25442,442 ±  455,554  
>> ops/s
>> value.SliceToArray.par_limit       10000  thrpt   30  23111,730 ± 2246,086  
>> ops/s
>> value.SliceToArray.par_skipLimit   10000  thrpt   30  17980,750 ± 2329,077  
>> ops/s
>> value.SliceToArray.seq_baseline    10000  thrpt   30  66512,898 ± 1001,042  
>> ops/s
>> value.SliceToArray.seq_limit       10000  thrpt   30  41792,549 ± 1085,547  
>> ops/s
>> value.SliceToArray.seq_skipLimit   10000  thrpt   30  18007,613 ±  141,716  
>> ops/s
>> 
>> 
>> I also modernized SliceOps a little bit, using switch expression (with no 
>> explicit default!) and diamonds on anonymous classes.
>
> Tagir F. Valeev has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Fixes according to review:
>   
>   1. Comments in adjustSize
>   2. repeating code extracted from testNoEvaluationForSizedStream

Even though there are not many changes this cuts deep into how streams work. 

I suspect there is some, possibly minor, impact for sized streams without 
limit/skip because of the increased cost to compute the exact size. Further, 
that cost is also incurred for `AbstractWrappingSpliterator`, where we might 
need to cache the exact size result.

I made some suggestions, mostly around naming, but I admit to being a little 
uncomfortable with the proposed change and need to think a little more about 
it. I am wondering if we need another stream flag indicating the size is known 
and has to be computed?

src/java.base/share/classes/java/util/stream/AbstractPipeline.java line 481:

> 479:     long adjustSize(long size) {
> 480:         return previousStage == null ? size : 
> previousStage.adjustSize(size);
> 481:     }

Suggestion:

    /**
     * Returns the exact output size of the pipeline given the exact size 
reported by the source spliterator.
     *
     * @param sourceSize the exact size reported by the source spliterator
     * @return the exact output size
     */
    long exactOutputSize(long sourceSize) {
        return previousStage == null ? sourceSize : 
previousStage.exactOutputSize(sourceSize);
    }


Sticking with the `exactOutput` name i think complements the two methods.

src/java.base/share/classes/java/util/stream/SliceOps.java line 232:

> 230:         if (skip < 0)
> 231:             throw new IllegalArgumentException("Skip must be 
> non-negative: " + skip);
> 232:         long adjustedLimit = limit >= 0 ? limit : Long.MAX_VALUE;

Suggestion:

        long normalizedLimit = limit >= 0 ? limit : Long.MAX_VALUE;


The name `adjustedLimit` implies the limit value is changed, rather than 
changing the no-limit value.

-------------

Changes requested by psandoz (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3427

Re: RFR: 8265029: Preserve SIZED characteristics on slice operations (skip, limit) [v2]

Reply via email to