Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-06-06 Thread Xiaohong Gong
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz  wrote:

>> @PaulSandoz, could you please help to check whether the current version is 
>> ok for you? Thanks so much!
>
> @XiaohongGong looks good, now the Vector API JEP has been integrated you will 
> get a merge conflict, but it should be easier to resolve.

Thanks for the review @PaulSandoz @merykitty !

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-06-01 Thread Xiaohong Gong
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz  wrote:

>> @PaulSandoz, could you please help to check whether the current version is 
>> ok for you? Thanks so much!
>
> @XiaohongGong looks good, now the Vector API JEP has been integrated you will 
> get a merge conflict, but it should be easier to resolve.

Hi @PaulSandoz , I updated the patch and resolved the conflicts. Could you 
please take a look at it again? Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-31 Thread Xiaohong Gong
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong  wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Wrap the offset check into a static method
>
> @PaulSandoz, could you please help to check whether the current version is ok 
> for you? Thanks so much!

> @XiaohongGong looks good, now the Vector API JEP has been integrated you will 
> get a merge conflict, but it should be easier to resolve.

Great! I will resolve the conflict and update the patch, thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-31 Thread Paul Sandoz
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong  wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Wrap the offset check into a static method
>
> @PaulSandoz, could you please help to check whether the current version is ok 
> for you? Thanks so much!

@XiaohongGong looks good, now the Vector API JEP has been integrated you will 
get a merge conflict, but it should be easier to resolve.

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-29 Thread Xiaohong Gong
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Wrap the offset check into a static method

@PaulSandoz, could you please help to check whether the current version is ok 
for you? Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-13 Thread Xiaohong Gong
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Wrap the offset check into a static method

Thanks for the explanation! Yeah, the main problem is Java doesn't have the 
direct unsigned comparison. We need the function call. From the two ways you 
provided, I think the second `Integer.lessThanUnsigned` looks better. But I'm 
not sure whether this could improve the performance a lot, although the first 
check `a.length - vsp.length() > 0` can be hosited out side of the loop. And 
this might make the codes more complex for me. Maybe we can do a pre research 
to find a better implementation to the unsigned comparison first. Do you think 
so?

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-12 Thread Quan Anh Mai
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong  wrote:

>> Checking whether the indexes of masked lanes are inside of the valid memory 
>> boundary is necessary for masked vector memory access. However, this could 
>> be saved if the given offset is inside of the vector range that could make 
>> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs 
>> have saved this kind of check for common cases. And this patch did the 
>> similar optimization for the masked vector store.
>> 
>> The performance for the new added store masked benchmarks improves about 
>> `1.83x ~ 2.62x` on a x86 system:
>> 
>> Benchmark   BeforeAfter Gain Units
>> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
>> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
>> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
>> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
>> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
>> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
>> 
>> Similar performane gain can also be observed on ARM hardware.
>
> Xiaohong Gong has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Wrap the offset check into a static method

However, we seem to lack the ability to do an unsigned comparison reliably. C2 
can transform `x + MIN_VALUE <=> y + MIN_VALUE` into `x u<=> y` but it will 
fail if `x` or `y` is an addition with constant in such cases the constants 
will be merged together. As a result, I think we need an intrinsic for this. 
`Integer.compareUnsigned` may fit but it manifests the result into an integer 
register which may lead to suboptimal materialisation of flags, another 
approach would be to have a separate method `Integer.lessThanUnsigned` which 
only returns `boolean` and C2 can have better time splitting the boolean 
comparison through `IfNode`, which will prevent the materialisation of 
`boolean` values. What do you two think?

I.e, after splitting if through merge point, the shape of `if 
(Integer.lessThanUnsigned(a, b))` would be transformed from

 ab
  \  /
CmpU
 |
Bool
 |
If
  / \
IfTrueIfFalse
  \ /
Region10
\ |   /
 Phi 0
  \ /
  CmpI

into

 ab
  \  /
CmpU

Thanks.

-

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-12 Thread Xiaohong Gong
> Checking whether the indexes of masked lanes are inside of the valid memory 
> boundary is necessary for masked vector memory access. However, this could be 
> saved if the given offset is inside of the vector range that could make sure 
> no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have 
> saved this kind of check for common cases. And this patch did the similar 
> optimization for the masked vector store.
> 
> The performance for the new added store masked benchmarks improves about 
> `1.83x ~ 2.62x` on a x86 system:
> 
> Benchmark   BeforeAfter Gain Units
> StoreMaskedBenchmark.byteStoreArrayMask   12757.936 23291.118  1.826 ops/ms
> StoreMaskedBenchmark.doubleStoreArrayMask  1520.932  3921.616  2.578 ops/ms
> StoreMaskedBenchmark.floatStoreArrayMask   2713.031  7122.535  2.625 ops/ms
> StoreMaskedBenchmark.intStoreArrayMask 4113.772  8220.206  1.998 ops/ms
> StoreMaskedBenchmark.longStoreArrayMask1993.986  4874.148  2.444 ops/ms
> StoreMaskedBenchmark.shortStoreArrayMask   8543.593 17821.086  2.086 ops/ms
> 
> Similar performane gain can also be observed on ARM hardware.

Xiaohong Gong has updated the pull request incrementally with one additional 
commit since the last revision:

  Wrap the offset check into a static method

-

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/8620/files
  - new: https://git.openjdk.java.net/jdk/pull/8620/files/eb67f681..2229dd24

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8620&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8620&range=00-01

  Stats: 56 lines in 8 files changed: 5 ins; 0 del; 51 mod
  Patch: https://git.openjdk.java.net/jdk/pull/8620.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/8620/head:pull/8620

PR: https://git.openjdk.java.net/jdk/pull/8620


Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]

2022-05-12 Thread Xiaohong Gong
On Thu, 12 May 2022 03:36:31 GMT, Paul Sandoz  wrote:

>> Thanks for the review @PaulSandoz ! For the 
>> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be 
>> used here because the `outOfBounds` exception will be thrown if the offset 
>> is not inside of the valid array boundary. And  for the masked operations, 
>> this is not needed since we only need to check the masked lanes. Please 
>> correct me if I didn't understand correctly. Thanks!
>
> Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My 
> intent was that we could use a method rather than repeating the expression in 
> numerous places (including masked loads IIRC).

Hi @PaulSandoz , I'v updated the offset check for masked load/store. Could you 
please help to check whether it is ok? Thanks so much!

-

PR: https://git.openjdk.java.net/jdk/pull/8620