On Wed, 15 Nov 2023 07:48:28 GMT, Eric Liu <e...@openjdk.org> wrote:

> Vector API defines zero-extend operations [1], which are going to be 
> intrinsified and generated to `VectorUCastNode` by C2. This patch adds 
> backend implementation for `VectorUCastNode` on AArch64.
> 
> The micro benchmark shows significant performance improvement. In my test 
> machine (SVE, 256-bit), the result is shown as below:
> 
> 
> 
>   Benchmark                     Before     After       Units   Gain
>   VectorZeroExtend.byte2Int     3168.251   243012.399  ops/ms  75.70
>   VectorZeroExtend.byte2Long    3212.201   216291.588  ops/ms  66.33
>   VectorZeroExtend.byte2Short   3391.968   182655.365  ops/ms  52.85
>   VectorZeroExtend.int2Long     1012.197    80448.553  ops/ms  78.48
>   VectorZeroExtend.short2Int    1812.471   153416.828  ops/ms  83.65
>   VectorZeroExtend.short2Long   1788.382   129794.814  ops/ms  71.58
> 
> 
> On other Neon systems, we can get similar performance boost as a result of 
> intrinsification success.
> 
> Since `VectorUCastNode` only used in Vector API's zero extension currently, 
> this patch also adds assertion on nodes' definitions to clarify their usages.
> 
> [TEST]
> compiler/vectorapi and jdk/incubator/vector passed on NEON and SVE machines.
> 
> [1] 
> https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java#L726

This pull request has now been integrated.

Changeset: 9b8eaa2f
Author:    Eric Liu <e...@openjdk.org>
URL:       
https://git.openjdk.org/jdk/commit/9b8eaa2fc3c5127bc7828471916f5d881bf71228
Stats:     381 lines in 8 files changed: 299 ins; 23 del; 59 mod

8319872: AArch64: [vectorapi] Implementation of unsigned (zero extended) casts

Reviewed-by: aph, xgong

-------------

PR: https://git.openjdk.org/jdk/pull/16670

Reply via email to