On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong <xg...@openjdk.org> wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark                                   Before    After     Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask     4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask    1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Wrap the offset check into a static method

However, we seem to lack the ability to do an unsigned comparison reliably. C2 
can transform `x + MIN_VALUE <=> y + MIN_VALUE` into `x u<=> y` but it will 
fail if `x` or `y` is an addition with constant in such cases the constants 
will be merged together. As a result, I think we need an intrinsic for this. 
`Integer.compareUnsigned` may fit but it manifests the result into an integer 
register which may lead to suboptimal materialisation of flags, another 
approach would be to have a separate method `Integer.lessThanUnsigned` which 
only returns `boolean` and C2 can have better time splitting the boolean 
comparison through `IfNode`, which will prevent the materialisation of 
`boolean` values. What do you two think?

I.e, after splitting if through merge point, the shape of `if 
(Integer.lessThanUnsigned(a, b))` would be transformed from

         a        b
          \      /
            CmpU
             |
            Bool
             |
            If
          /     \
    IfTrue        IfFalse
          \     /
            Region        1        0
                \         |       /
                         Phi         0
                          \         /
                              CmpI

into

         a        b
          \      /
            CmpU

Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8620

Reply via email to