On Mon, 5 Jan 2026 11:31:26 GMT, Yi Wu <[email protected]> wrote: >> This patch adds mid-end support for vectorized min/max reduction operations >> for half floats. It also includes backend AArch64 support for these >> operations. >> Both floating point min/max reductions don’t require strict order, because >> they are associative. >> >> It will generate NEON fminv/fmaxv reduction instructions when max vector >> length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, >> it will generate the SVE fminv/fmaxv instructions. >> The patch also adds support for partial min/max reductions on SVE machines >> using fminv/fmaxv. >> >> Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is >> better than the mainline. >> >> Neoverse N1 (UseSVE = 0, max vector length = 16B): >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 3.69 6.44 >> ReductionMaxFP16 512 thrpt 9 3.71 7.62 >> ReductionMaxFP16 1024 thrpt 9 4.16 8.64 >> ReductionMaxFP16 2048 thrpt 9 4.44 9.12 >> ReductionMinFP16 256 thrpt 9 3.69 6.43 >> ReductionMinFP16 512 thrpt 9 3.70 7.62 >> ReductionMinFP16 1024 thrpt 9 4.16 8.64 >> ReductionMinFP16 2048 thrpt 9 4.44 9.10 >> >> >> Neoverse V1 (UseSVE = 1, max vector length = 32B): >> >> Benchmark vectorDim Mode Cnt 8B 16B 32B >> ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 >> ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 >> ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 >> ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 >> ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 >> ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 >> ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 >> ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 >> >> >> Neoverse V2 (UseSVE = 2, max vector length = 16B): >> >> Benchmark vectorDim Mode Cnt 8B 16B >> ReductionMaxFP16 256 thrpt 9 4.78 10.00 >> ReductionMaxFP16 512 thrpt 9 3.74 11.33 >> ReductionMaxFP16 1024 thrpt 9 3.86 9.59 >> ReductionMaxFP16 2048 thrpt 9 3.94 8.71 >> ReductionMinFP16 256 thrpt 9 4.78 10.00 >> ReductionMinFP16 512 thrpt 9 3.74 11.29 >> ReductionMinFP16 1024 thrpt 9 3.86 9.58 >> ReductionMinFP16 2048 thrpt 9 3.94 8.71 >> >> >> Testing: >> hotspot_all, jdk (tier1-3) and langtools (tier1) all pass ... > > Yi Wu has updated the pull request with a new target base due to a merge or a > rebase. The incremental webrev excludes the unrelated changes brought in by > the merge/rebase. The pull request contains four additional commits since the > last revision: > > - Replace assert with verify > - Add IRNode constant and code refactor > - Merge remote-tracking branch 'origin/master' into yiwu-8373344 > - 8373344: Add support for FP16 min/max reduction operations > > This patch adds mid-end support for vectorized min/max reduction > operations for half floats. It also includes backend AArch64 support > for these operations. > Both floating point min/max reductions don’t require strict order, > because they are associative. > > It will generate NEON fminv/fmaxv reduction instructions when > max vector length is 8B or 16B. On SVE supporting machines > with vector lengths > 16B, it will generate the SVE fminv/fmaxv > instructions. > The patch also adds support for partial min/max reductions on > SVE machines using fminv/fmaxv. > > Ratio of throughput(ops/ms) > 1 indicates the performance with > this patch is better than the mainline. > > Neoverse N1 (UseSVE = 0, max vector length = 16B): > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 thrpt 9 3.69 6.44 > ReductionMaxFP16 512 thrpt 9 3.71 7.62 > ReductionMaxFP16 1024 thrpt 9 4.16 8.64 > ReductionMaxFP16 2048 thrpt 9 4.44 9.12 > ReductionMinFP16 256 thrpt 9 3.69 6.43 > ReductionMinFP16 512 thrpt 9 3.70 7.62 > ReductionMinFP16 1024 thrpt 9 4.16 8.64 > ReductionMinFP16 2048 thrpt 9 4.44 9.10 > > Neoverse V1 (UseSVE = 1, max vector length = 32B): > Benchmark vectorDim Mode Cnt 8B 16B 32B > ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02 > ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71 > ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07 > ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69 > ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03 > ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69 > ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12 > ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70 > > Neoverse V2 (UseSVE = 2, max vector length = 16B): > Benchmark vectorDim Mode Cnt 8B 16B > ReductionMaxFP16 256 t...
src/hotspot/cpu/aarch64/aarch64_vector.ad line 381: > 379: case Op_XorReductionV: > 380: case Op_MinReductionVHF: > 381: case Op_MaxReductionVHF: We can use the NEON instructions if the vector size <= 16B as well for partial cases. Did you test the performance with NEON instead of using predicated SVE instructions? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2663933727
