On Sat, 1 Apr 2023 07:44:25 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:

>> `Vector::slice` is a method at the top-level class of the Vector API that 
>> concatenates the 2 inputs into an intermediate composite and extracts a 
>> window equal to the size of the inputs into the result. It is used in vector 
>> conversion methods where the part number is not 0 to slice the parts to the 
>> correct positions. Slicing is also used in text processing such as utf8 and 
>> utf16 validation. x86 starting from SSSE3 has `palignr` which does vector 
>> slicing very efficiently. As a result, I think it is beneficial to add a C2 
>> node for this operation as well as intrinsify `Vector::slice` method.
>> 
>> A slice is currently implemented as 
>> `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires 
>> preparation of the index vector and the blending mask. Even with the 
>> preparations being hoisted out of the loops, microbenchmarks show 
>> improvement using the slice instrinsics. Some have tremendous increases in 
>> throughput due to the limitation that a mask of length 2 cannot currently be 
>> intrinsified, leading to falling back to the Java implementations.
>> 
>> Please take a look and have some reviews. Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains ten commits:
> 
>  - instruction asserts
>  - Merge branch 'master' into sliceIntrinsics
>  - add comments explaining anonymous classes
>  - address reviews
>  - sse2, increase warmup
>  - aesthetic
>  - optimise 64B
>  - add jmh
>  - vector slice intrinsics

With the latest PR I am observing failures with debug builds for test 
compiler/vectorapi/TestVectorSlice.java on both AVX512 machines and aarch64 
machines.

On AVX512 machines the test fails with JVM args `-XX:UseAVX=3` and 
`-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` and results in 
a test assertion failure e.g.,

Caused by: java.lang.RuntimeException: assertEquals: expected 70 to equal 0
        at jdk.test.lib.Asserts.fail(Asserts.java:594)
        at jdk.test.lib.Asserts.assertEquals(Asserts.java:205)
        at jdk.test.lib.Asserts.assertEquals(Asserts.java:189)
        at 
compiler.vectorapi.TestVectorSlice.lambda$testInts$2(TestVectorSlice.java:163)
        at compiler.vectorapi.TestVectorSlice.testInts(TestVectorSlice.java:181)
        at 
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        ... 7 more


CPU flags are:

fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush 
mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant tsc arch perfmon 
rep good nopl xtopology cpuid tsc known freq pni pclmulqdq vmx ssse3 fma cx16 
pdcm pcid sse4 1 sse4 2 x2apic movbe popcnt tsc deadline timer aes xsave avx 
f16c rdrand hypervisor lahf lm abm 3dnowprefetch cpuid fault invpcid single 
ssbd ibrs ibpb stibp ibrs enhanced tpr shadow vnmi flexpriority ept vpid ept ad 
fsgsbase tsc adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed 
adx smap avx512ifma clflushopt clwb avx512cd sha ni avx512bw avx512vl xsaveopt 
xsavec xgetbv1 xsaves nt good wbnoinvd arat avx512vbmi umip pku ospke avx512 
vbmi2 gfni vaes vpclmulqdq avx512 vnni avx512 bitalg avx512 vpopcntdq la57 
rdpid md clear arch capabilities


On aarch64 there is an IR rule failure.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/12909#issuecomment-1494641261

Reply via email to