Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz wrote: >> @PaulSandoz, could you please help to check whether the current version is >> ok for you? Thanks so much! > > @XiaohongGong looks good, now the Vector API JEP has been integrated you will > get a merge conflict, but it should be easier to resolve. Thanks for the review @PaulSandoz @merykitty ! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Tue, 31 May 2022 16:48:27 GMT, Paul Sandoz wrote: >> @PaulSandoz, could you please help to check whether the current version is >> ok for you? Thanks so much! > > @XiaohongGong looks good, now the Vector API JEP has been integrated you will > get a merge conflict, but it should be easier to resolve. Hi @PaulSandoz , I updated the patch and resolved the conflicts. Could you please take a look at it again? Thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Wrap the offset check into a static method > > @PaulSandoz, could you please help to check whether the current version is ok > for you? Thanks so much! > @XiaohongGong looks good, now the Vector API JEP has been integrated you will > get a merge conflict, but it should be easier to resolve. Great! I will resolve the conflict and update the patch, thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Mon, 30 May 2022 01:17:00 GMT, Xiaohong Gong wrote: >> Xiaohong Gong has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Wrap the offset check into a static method > > @PaulSandoz, could you please help to check whether the current version is ok > for you? Thanks so much! @XiaohongGong looks good, now the Vector API JEP has been integrated you will get a merge conflict, but it should be easier to resolve. - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Wrap the offset check into a static method @PaulSandoz, could you please help to check whether the current version is ok for you? Thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Wrap the offset check into a static method Thanks for the explanation! Yeah, the main problem is Java doesn't have the direct unsigned comparison. We need the function call. From the two ways you provided, I think the second `Integer.lessThanUnsigned` looks better. But I'm not sure whether this could improve the performance a lot, although the first check `a.length - vsp.length() > 0` can be hosited out side of the loop. And this might make the codes more complex for me. Maybe we can do a pre research to find a better implementation to the unsigned comparison first. Do you think so? - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Fri, 13 May 2022 01:35:40 GMT, Xiaohong Gong wrote: >> Checking whether the indexes of masked lanes are inside of the valid memory >> boundary is necessary for masked vector memory access. However, this could >> be saved if the given offset is inside of the vector range that could make >> sure no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs >> have saved this kind of check for common cases. And this patch did the >> similar optimization for the masked vector store. >> >> The performance for the new added store masked benchmarks improves about >> `1.83x ~ 2.62x` on a x86 system: >> >> Benchmark BeforeAfter Gain Units >> StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms >> StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms >> StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms >> StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms >> StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms >> StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms >> >> Similar performane gain can also be observed on ARM hardware. > > Xiaohong Gong has updated the pull request incrementally with one additional > commit since the last revision: > > Wrap the offset check into a static method However, we seem to lack the ability to do an unsigned comparison reliably. C2 can transform `x + MIN_VALUE <=> y + MIN_VALUE` into `x u<=> y` but it will fail if `x` or `y` is an addition with constant in such cases the constants will be merged together. As a result, I think we need an intrinsic for this. `Integer.compareUnsigned` may fit but it manifests the result into an integer register which may lead to suboptimal materialisation of flags, another approach would be to have a separate method `Integer.lessThanUnsigned` which only returns `boolean` and C2 can have better time splitting the boolean comparison through `IfNode`, which will prevent the materialisation of `boolean` values. What do you two think? I.e, after splitting if through merge point, the shape of `if (Integer.lessThanUnsigned(a, b))` would be transformed from ab \ / CmpU | Bool | If / \ IfTrueIfFalse \ / Region10 \ | / Phi 0 \ / CmpI into ab \ / CmpU Thanks. - PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
> Checking whether the indexes of masked lanes are inside of the valid memory > boundary is necessary for masked vector memory access. However, this could be > saved if the given offset is inside of the vector range that could make sure > no IOOBE (IndexOutOfBoundaryException) happens. The masked load APIs have > saved this kind of check for common cases. And this patch did the similar > optimization for the masked vector store. > > The performance for the new added store masked benchmarks improves about > `1.83x ~ 2.62x` on a x86 system: > > Benchmark BeforeAfter Gain Units > StoreMaskedBenchmark.byteStoreArrayMask 12757.936 23291.118 1.826 ops/ms > StoreMaskedBenchmark.doubleStoreArrayMask 1520.932 3921.616 2.578 ops/ms > StoreMaskedBenchmark.floatStoreArrayMask 2713.031 7122.535 2.625 ops/ms > StoreMaskedBenchmark.intStoreArrayMask 4113.772 8220.206 1.998 ops/ms > StoreMaskedBenchmark.longStoreArrayMask1993.986 4874.148 2.444 ops/ms > StoreMaskedBenchmark.shortStoreArrayMask 8543.593 17821.086 2.086 ops/ms > > Similar performane gain can also be observed on ARM hardware. Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision: Wrap the offset check into a static method - Changes: - all: https://git.openjdk.java.net/jdk/pull/8620/files - new: https://git.openjdk.java.net/jdk/pull/8620/files/eb67f681..2229dd24 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=8620&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=8620&range=00-01 Stats: 56 lines in 8 files changed: 5 ins; 0 del; 51 mod Patch: https://git.openjdk.java.net/jdk/pull/8620.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/8620/head:pull/8620 PR: https://git.openjdk.java.net/jdk/pull/8620
Re: RFR: 8286279: [vectorapi] Only check index of masked lanes if offset is out of array boundary for masked store [v2]
On Thu, 12 May 2022 03:36:31 GMT, Paul Sandoz wrote: >> Thanks for the review @PaulSandoz ! For the >> `VectorIntrinsics.checkFromIndexSize`, I'm afraid it's not suitable to be >> used here because the `outOfBounds` exception will be thrown if the offset >> is not inside of the valid array boundary. And for the masked operations, >> this is not needed since we only need to check the masked lanes. Please >> correct me if I didn't understand correctly. Thanks! > > Silly me! i commented too quickly, `checkFromIndexSize` cannot be used. My > intent was that we could use a method rather than repeating the expression in > numerous places (including masked loads IIRC). Hi @PaulSandoz , I'v updated the offset check for masked load/store. Could you please help to check whether it is ok? Thanks so much! - PR: https://git.openjdk.java.net/jdk/pull/8620