On Mon, 4 Jul 2022 12:51:22 GMT, Andrew Haley <a...@openjdk.org> wrote:

> However, just putting aside for a moment the lack of useful abstraction 
> mechanisms, I note that there's a lot of code like this:
> 
> ```
>     if (length_in_bytes <= 16) {
>       // ... Neon
>     } else {
>       assert(UseSVE > 0, "must be sve");
>       // ... SVE
>     }
> ```
> 
> which is to say, there's an implicit assumption that if an operation can be 
> done with Neon it will be, and SVE will only be used if not. What is the 
> justification for that assumption?

Not exactly.
It's only for common **64/128-bit unpredicated** vector operations, when NEON 
have equivalent instructions as SVE.

Recall the **Drawback-1** and **Update-2 (part 2)** in the commit message.

Besides the code pattern you mentioned, there are many pairs of rules with 
"**_le128b**" and "**_gt128b**" suffixes, e.g., vmulI_le128b() and 
vmulI_gt128b(). We use two rules mainly because different numbers of arguments 
are used. Otherwise, we tend to put them into one rule, which is your mentioned 
pattern, e.g., vadd().

The main reason we conduct this change lies in that from Neoverse V1 and N2 
optimization guides, if the size fit, common NEON instructions are no slower 
than equivalent SVE instructions in latency and throughput.

Note-1: In current aarch64_sve.ad file, there are already several rules under 
this rule, e.g., loadV16_vreg(), vroundFtoI(), insertI_le128bits(). There is an 
ongoing patch as well in [link](https://github.com/openjdk/jdk/pull/7999). This 
patch makes them more clear.
Note-2: As we mentioned in the part 4 in **TESTING** section, we ran JMH 
testing on one SVE machine and didn't observe regression and we will do more 
measurement on different systems.

-------------

PR: https://git.openjdk.org/jdk/pull/9346

Reply via email to