On Mon, 8 Dec 2025 03:29:03 GMT, Eric Fang <[email protected]> wrote:

> This patch adds intrinsic support for UMIN and UMAX reduction operations in 
> the Vector API on AArch64, enabling direct hardware instruction mapping for 
> better performance.
> 
> Changes:
> --------
> 
> 1. C2 mid-end:
>    - Added UMinReductionVNode and UMaxReductionVNode
> 
> 2. AArch64 Backend:
>    - Added uminp/umaxp/sve_uminv/sve_umaxv instructions
>    - Updated match rules for all vector sizes and element types
>    - Both NEON and SVE implementation are supported
> 
> 3. Test:
>    - Added UMIN_REDUCTION_V and UMAX_REDUCTION_V to IRNode.java
>    - Added assembly tests in aarch64-asmtest.py for new instructions
>    - Added a JTReg test file VectorUMinMaxReductionTest.java
> 
> Different configurations were tested on aarch64 and x86 machines, and all 
> tests passed.
> 
> Test results of JMH benchmarks from the panama-vector project:
> --------
> 
> On a Nvidia Grace machine with 128-bit SVE:
> 
> Benchmark                       Unit    Before  Error   After           Error 
>   Uplift
> Byte128Vector.UMAXLanes         ops/ms  411.60  42.18   25226.51        33.92 
>   61.29
> Byte128Vector.UMAXMaskedLanes   ops/ms  558.56  85.12   25182.90        28.74 
>   45.09
> Byte128Vector.UMINLanes         ops/ms  645.58  780.76  28396.29        
> 103.11  43.99
> Byte128Vector.UMINMaskedLanes   ops/ms  621.09  718.27  26122.62        42.68 
>   42.06
> Byte64Vector.UMAXLanes          ops/ms  296.33  34.44   14357.74        15.95 
>   48.45
> Byte64Vector.UMAXMaskedLanes    ops/ms  376.54  44.01   14269.24        21.41 
>   37.90
> Byte64Vector.UMINLanes          ops/ms  373.45  426.51  15425.36        66.20 
>   41.31
> Byte64Vector.UMINMaskedLanes    ops/ms  353.32  346.87  14201.37        13.79 
>   40.19
> Int128Vector.UMAXLanes          ops/ms  174.79  192.51  9906.07         
> 286.93  56.67
> Int128Vector.UMAXMaskedLanes    ops/ms  157.23  206.68  10246.77        11.44 
>   65.17
> Int64Vector.UMAXLanes           ops/ms  95.30   126.49  4719.30         98.57 
>   49.52
> Int64Vector.UMAXMaskedLanes     ops/ms  88.19   87.44   4693.18         19.76 
>   53.22
> Long128Vector.UMAXLanes         ops/ms  80.62   97.82   5064.01         35.52 
>   62.82
> Long128Vector.UMAXMaskedLanes   ops/ms  78.15   102.91  5028.24         8.74  
>   64.34
> Long64Vector.UMAXLanes          ops/ms  47.56   62.01   46.76           52.28 
>   0.98
> Long64Vector.UMAXMaskedLanes    ops/ms  45.44   46.76   45.79           42.91 
>   1.01
> Short128Vector.UMAXLanes        ops/ms  316.65  410.30  14814.82        23.65 
>   46.79
> Short128Vector.UMAXMaskedLanes  ops/ms  308.90  351.78  15155.26        31.03 
>   49.06
> Sh...

This pull request has now been integrated.

Changeset: d0e97307
Author:    Eric Fang <[email protected]>
Committer: Xiaohong Gong <[email protected]>
URL:       
https://git.openjdk.org/jdk/commit/d0e97307836c49291f24ae7cb1c2e9319b986f8c
Stats:     1372 lines in 14 files changed: 954 ins; 14 del; 404 mod

8372980: [VectorAPI] AArch64: Add intrinsic support for unsigned min/max 
reduction operations

Co-authored-by: Andrew Haley <[email protected]>
Reviewed-by: aph, xgong

-------------

PR: https://git.openjdk.org/jdk/pull/28693

Reply via email to