On Wed, 15 Nov 2023 07:48:28 GMT, Eric Liu <e...@openjdk.org> wrote: > Vector API defines zero-extend operations [1], which are going to be > intrinsified and generated to `VectorUCastNode` by C2. This patch adds > backend implementation for `VectorUCastNode` on AArch64. > > The micro benchmark shows significant performance improvement. In my test > machine (SVE, 256-bit), the result is shown as below: > > > > Benchmark Before After Units Gain > VectorZeroExtend.byte2Int 3168.251 243012.399 ops/ms 75.70 > VectorZeroExtend.byte2Long 3212.201 216291.588 ops/ms 66.33 > VectorZeroExtend.byte2Short 3391.968 182655.365 ops/ms 52.85 > VectorZeroExtend.int2Long 1012.197 80448.553 ops/ms 78.48 > VectorZeroExtend.short2Int 1812.471 153416.828 ops/ms 83.65 > VectorZeroExtend.short2Long 1788.382 129794.814 ops/ms 71.58 > > > On other Neon systems, we can get similar performance boost as a result of > intrinsification success. > > Since `VectorUCastNode` only used in Vector API's zero extension currently, > this patch also adds assertion on nodes' definitions to clarify their usages. > > [TEST] > compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines. > > [1] > https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726
This pull request has now been integrated. Changeset: 9b8eaa2f Author: Eric Liu <e...@openjdk.org> URL: https://git.openjdk.org/jdk/commit/9b8eaa2fc3c5127bc7828471916f5d881bf71228 Stats: 381 lines in 8 files changed: 299 ins; 23 del; 59 mod 8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts Reviewed-by: aph, xgong ------------- PR: https://git.openjdk.org/jdk/pull/16670