On Sat, 1 Apr 2023 07:44:25 GMT, Quan Anh Mai <[email protected]> wrote:
>> `Vector::slice` is a method at the top-level class of the Vector API that
>> concatenates the 2 inputs into an intermediate composite and extracts a
>> window equal to the size of the inputs into the result. It is used in vector
>> conversion methods where the part number is not 0 to slice the parts to the
>> correct positions. Slicing is also used in text processing such as utf8 and
>> utf16 validation. x86 starting from SSSE3 has `palignr` which does vector
>> slicing very efficiently. As a result, I think it is beneficial to add a C2
>> node for this operation as well as intrinsify `Vector::slice` method.
>>
>> A slice is currently implemented as
>> `v2.rearrange(iota).blend(v1.rearrange(iota), blendMask)` which requires
>> preparation of the index vector and the blending mask. Even with the
>> preparations being hoisted out of the loops, microbenchmarks show
>> improvement using the slice instrinsics. Some have tremendous increases in
>> throughput due to the limitation that a mask of length 2 cannot currently be
>> intrinsified, leading to falling back to the Java implementations.
>>
>> Please take a look and have some reviews. Thank you very much.
>
> Quan Anh Mai has updated the pull request with a new target base due to a
> merge or a rebase. The pull request now contains ten commits:
>
> - instruction asserts
> - Merge branch 'master' into sliceIntrinsics
> - add comments explaining anonymous classes
> - address reviews
> - sse2, increase warmup
> - aesthetic
> - optimise 64B
> - add jmh
> - vector slice intrinsics
With the latest PR I am observing failures with debug builds for test
compiler/vectorapi/TestVectorSlice.java on both AVX512 machines and aarch64
machines.
On AVX512 machines the test fails with JVM args `-XX:UseAVX=3` and
`-XX:UseAVX=3 -XX:+UnlockDiagnosticVMOptions -XX:+UseKNLSetting` and results in
a test assertion failure e.g.,
Caused by: java.lang.RuntimeException: assertEquals: expected 70 to equal 0
at jdk.test.lib.Asserts.fail(Asserts.java:594)
at jdk.test.lib.Asserts.assertEquals(Asserts.java:205)
at jdk.test.lib.Asserts.assertEquals(Asserts.java:189)
at
compiler.vectorapi.TestVectorSlice.lambda$testInts$2(TestVectorSlice.java:163)
at compiler.vectorapi.TestVectorSlice.testInts(TestVectorSlice.java:181)
at
java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
... 7 more
CPU flags are:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush
mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant tsc arch perfmon
rep good nopl xtopology cpuid tsc known freq pni pclmulqdq vmx ssse3 fma cx16
pdcm pcid sse4 1 sse4 2 x2apic movbe popcnt tsc deadline timer aes xsave avx
f16c rdrand hypervisor lahf lm abm 3dnowprefetch cpuid fault invpcid single
ssbd ibrs ibpb stibp ibrs enhanced tpr shadow vnmi flexpriority ept vpid ept ad
fsgsbase tsc adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed
adx smap avx512ifma clflushopt clwb avx512cd sha ni avx512bw avx512vl xsaveopt
xsavec xgetbv1 xsaves nt good wbnoinvd arat avx512vbmi umip pku ospke avx512
vbmi2 gfni vaes vpclmulqdq avx512 vnni avx512 bitalg avx512 vpopcntdq la57
rdpid md clear arch capabilities
On aarch64 there is an IR rule failure.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/12909#issuecomment-1494641261