On Sun, 19 Mar 2023 13:04:19 GMT, Quan Anh Mai <qa...@openjdk.org> wrote:
> Hi, > > This patch reimplements `VectorShuffle` implementations to be a vector of the > bit type. Currently, VectorShuffle is stored as a byte array, and would be > expanded upon usage. This poses several drawbacks: > > 1. Inefficient conversions between a shuffle and its corresponding vector. > This hinders the performance when the shuffle indices are not constant and > are loaded or computed dynamically. > 2. Redundant expansions in `rearrange` operations. On all platforms, it seems > that a shuffle index vector is always expanded to the correct type before > executing the `rearrange` operations. > 3. Some redundant intrinsics are needed to support this handling as well as > special considerations in the C2 compiler. > 4. Range checks are performed using `VectorShuffle::toVector`, which is > inefficient for FP types since both FP conversions and FP comparisons are > more expensive than the integral ones. > > Upon these changes, a `rearrange` can emit more efficient code: > > var species = IntVector.SPECIES_128; > var v1 = IntVector.fromArray(species, SRC1, 0); > var v2 = IntVector.fromArray(species, SRC2, 0); > v1.rearrange(v2.toShuffle()).intoArray(DST, 0); > > Before: > movabs $0x751589fa8,%r10 ; {oop([I{0x0000000751589fa8})} > vmovdqu 0x10(%r10),%xmm2 > movabs $0x7515a0d08,%r10 ; {oop([I{0x00000007515a0d08})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158afb8,%r10 ; {oop([I{0x000000075158afb8})} > vmovdqu 0x10(%r10),%xmm0 > vpand -0xddc12(%rip),%xmm0,%xmm0 # Stub::vector_int_to_byte_mask > ; > {external_word} > vpackusdw %xmm0,%xmm0,%xmm0 > vpackuswb %xmm0,%xmm0,%xmm0 > vpmovsxbd %xmm0,%xmm3 > vpcmpgtd %xmm3,%xmm1,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fc2acb4e0d8 > vpmovzxbd %xmm0,%xmm0 > vpermd %ymm2,%ymm0,%ymm0 > movabs $0x751588f98,%r10 ; {oop([I{0x0000000751588f98})} > vmovdqu %xmm0,0x10(%r10) > > After: > movabs $0x751589c78,%r10 ; {oop([I{0x0000000751589c78})} > vmovdqu 0x10(%r10),%xmm1 > movabs $0x75158ac88,%r10 ; {oop([I{0x000000075158ac88})} > vmovdqu 0x10(%r10),%xmm2 > vpxor %xmm0,%xmm0,%xmm0 > vpcmpgtd %xmm2,%xmm0,%xmm3 > vtestps %xmm3,%xmm3 > jne 0x00007fa818b27cb1 > vpermd %ymm1,%ymm2,%ymm0 > movabs $0x751588c68,%r10 ; {oop([I{0x0000000751588c68})} > vmovdqu %xmm0,0x10(%r10) > > Please take a look and leave reviews. Thanks a lot. This pull request has now been integrated. Changeset: e846a1d7 Author: Quan Anh Mai <qa...@openjdk.org> URL: https://git.openjdk.org/jdk/commit/e846a1d70043f7b57ae76847e85e5426c86539a5 Stats: 3690 lines in 64 files changed: 1615 ins; 1169 del; 906 mod 8304450: [vectorapi] Refactor VectorShuffle implementation Reviewed-by: psandoz, xgong, jbhateja, vlivanov ------------- PR: https://git.openjdk.org/jdk/pull/13093