> JDK-8318650 introduced hotspot intrinsification of subword gather load APIs > for X86 platforms [1]. However, the current implementation is not optimal for > AArch64 SVE platform, which natively supports vector instructions for subword > gather load operations using an int vector for indices (see [2][3]). > > Two key areas require improvement: > 1. At the Java level, vector indices generated for range validation could be > reused for the subsequent gather load operation on architectures with native > vector instructions like AArch64 SVE. However, the current implementation > prevents compiler reuse of these index vectors due to divergent control flow, > potentially impacting performance. > 2. At the compiler IR level, the additional `offset` input for > `LoadVectorGather`/`LoadVectorGatherMasked` with subword types increases IR > complexity and complicates backend implementation. Furthermore, generating > `add` instructions before each memory access negatively impacts performance. > > This patch refactors the implementation at both the Java level and compiler > mid-end to improve efficiency and maintainability across different > architectures. > > Main changes: > 1. Java-side API refactoring: > - Explicitly passes generated index vectors to hotspot, eliminating > duplicate index vectors for gather load instructions on > architectures like AArch64. > 2. C2 compiler IR refactoring: > - Refactors `LoadVectorGather`/`LoadVectorGatherMasked` IR for subword > types by removing the memory offset input and incorporating it into the > memory base `addr` at the IR level. This simplifies backend implementation, > reduces add operations, and unifies the IR across all types. > 3. Backend changes: > - Streamlines X86 implementation of subword gather operations following > the removal of the offset input from the IR level. > > Performance: > The performance of the relative JMH improves up to 27% on a X86 AVX512 > system. Please see the data below: > > Benchmark Mode Cnt Unit > SIZE Before After Gain > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms > 64 53682.012 52650.325 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms > 256 14484.252 14255.156 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms > 1024 3664.900 3595.615 0.98 > GatherOperationsBenchmark.microByteGather128 thrpt 30 ops/ms > 4096 908.312 935.269 1.02 > GatherOperationsBenchmark.micr...
Xiaohong Gong has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Address review comments - Merge 'jdk:master' into JDK-8355563 - 8355563: VectorAPI: Refactor current implementation of subword gather load API ------------- Changes: https://git.openjdk.org/jdk/pull/25138/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=25138&range=01 Stats: 450 lines in 15 files changed: 109 ins; 176 del; 165 mod Patch: https://git.openjdk.org/jdk/pull/25138.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/25138/head:pull/25138 PR: https://git.openjdk.org/jdk/pull/25138