On Mon, 15 Dec 2025 15:51:32 GMT, Yi Wu <[email protected]> wrote:
> This patch adds mid-end support for vectorized min/max reduction operations
> for half floats. It also includes backend AArch64 support for these
> operations.
> Both floating point min/max reductions don’t require strict order, because
> they are associative.
>
> It will generate NEON fminv/fmaxv reduction instructions when max vector
> length is 8B or 16B. On SVE supporting machines with vector lengths > 16B, it
> will generate the SVE fminv/fmaxv instructions.
> The patch also adds support for partial min/max reductions on SVE machines
> using fminv/fmaxv.
>
> Ratio of throughput(ops/ms) > 1 indicates the performance with this patch is
> better than the mainline.
>
> Neoverse N1 (UseSVE = 0, max vector length = 16B):
>
> Benchmark vectorDim Mode Cnt 8B 16B
> ReductionMaxFP16 256 thrpt 9 3.69 6.44
> ReductionMaxFP16 512 thrpt 9 3.71 7.62
> ReductionMaxFP16 1024 thrpt 9 4.16 8.64
> ReductionMaxFP16 2048 thrpt 9 4.44 9.12
> ReductionMinFP16 256 thrpt 9 3.69 6.43
> ReductionMinFP16 512 thrpt 9 3.70 7.62
> ReductionMinFP16 1024 thrpt 9 4.16 8.64
> ReductionMinFP16 2048 thrpt 9 4.44 9.10
>
>
> Neoverse V1 (UseSVE = 1, max vector length = 32B):
>
> Benchmark vectorDim Mode Cnt 8B 16B 32B
> ReductionMaxFP16 256 thrpt 9 3.96 8.62 8.02
> ReductionMaxFP16 512 thrpt 9 3.54 9.25 11.71
> ReductionMaxFP16 1024 thrpt 9 3.77 8.71 14.07
> ReductionMaxFP16 2048 thrpt 9 3.88 8.44 14.69
> ReductionMinFP16 256 thrpt 9 3.96 8.61 8.03
> ReductionMinFP16 512 thrpt 9 3.54 9.28 11.69
> ReductionMinFP16 1024 thrpt 9 3.76 8.70 14.12
> ReductionMinFP16 2048 thrpt 9 3.87 8.45 14.70
>
>
> Neoverse V2 (UseSVE = 2, max vector length = 16B):
>
> Benchmark vectorDim Mode Cnt 8B 16B
> ReductionMaxFP16 256 thrpt 9 4.78 10.00
> ReductionMaxFP16 512 thrpt 9 3.74 11.33
> ReductionMaxFP16 1024 thrpt 9 3.86 9.59
> ReductionMaxFP16 2048 thrpt 9 3.94 8.71
> ReductionMinFP16 256 thrpt 9 4.78 10.00
> ReductionMinFP16 512 thrpt 9 3.74 11.29
> ReductionMinFP16 1024 thrpt 9 3.86 9.58
> ReductionMinFP16 2048 thrpt 9 3.94 8.71
>
>
> Testing:
> hotspot_all, jdk (tier1-3) and langtools (tier1) all pass on Neoverse
> N1/V1/V2.
Thanks @yiwu0b11, some superficial comments
test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line
486:
> 484: @Test
> 485: @Warmup(500)
> 486: @IR(counts = {"reduce_minHF_masked", " >0 "},
Could you add IRNode constants for `reduce_minHF_masked`? Also for the max
version below
test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java
line 319:
> 317:
> 318: @Benchmark
> 319: public short ReductionMinFP16() {
Suggestion:
public short reductionMinFP16() {
test/micro/org/openjdk/bench/jdk/incubator/vector/Float16OperationsBenchmark.java
line 328:
> 326:
> 327: @Benchmark
> 328: public short ReductionMaxFP16() {
Suggestion:
public short reductionMaxFP16() {
-------------
Changes requested by galder (Author).
PR Review: https://git.openjdk.org/jdk/pull/28828#pullrequestreview-3603354237
PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639273162
PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639270984
PR Review Comment: https://git.openjdk.org/jdk/pull/28828#discussion_r2639271426