Hi all,

This patch adds the recently-added tweak to split some SVE VL-based scalar
operations [1] to the generic tuning used for SVE, as enabled by adding +sve
to the -march flag, for example -march=armv8.2-a+sve.

The recommendation for best performance on a particular CPU remains unchanged:
use the -mcpu option for that CPU, where possible. -mcpu=native makes this
straightforward for native compilation.

The tweak to split out SVE VL-based scalar operations is a consistent win for
the Neoverse V1 CPU and should be neutral for the Fujitsu A64FX. A run of
SPEC2017 on A64FX with this tweak on didn't show any non-noise differences.
It is also expected to be neutral on SVE2 implementations.

Therefore, the patch enables the tweak for generic +sve tuning e.g.
-march=armv8.2-a+sve. No SVE2 CPUs are expected to benefit from it,
therefore the tweak is disabled for generic tuning when +sve2 is in
-march e.g. -march=armv8.2-a+sve2.

The implementation of this approach requires a bit of custom logic in
aarch64_override_options_internal to handle these kinds of
architecture-dependent decisions, but we do believe the user-facing principle
here is important to implement.

Qian, as you've contributed the A64FX support to GCC, I would be grateful for
your feedback on this approach and in particular on the performance evaluation
of this change.

In general, for the generic target we're using a decision framework that looks
like:

* If all cores that are known to benefit from an optimization
are of architecture X, and all other cores that implement X or above
are not impacted, or have a very slight impact, we will consider it for
generic tuning for architecture X.
* We will not enable that optimisation for generic tuning for architecture X+1
if no known cores of architecture X+1 or above will benefit.

This framework allows us to improve generic tuning for CPUs of generation X
while avoiding accumulating tweaks for future CPUs of generation X+1, X+2...
that do not need them, and thus avoid even the slight negative effects of
these optimisations if the user is willing to tell us the desired architecture
accurately.

X above can mean either annual architecture updates (Armv8.2-a, Armv8.3-a etc)
or optional architecture extensions (like SVE, SVE2).

We think that this patch fits that framework, so would like to propose it for
the trunk default tunings for SVE.

Bootstrapped and tested on aarch64-none-linux-gnu.

Thanks,
Kyrill

[1] http://gcc.gnu.org/g:a65b9ad863c5fc0aea12db58557f4d286a1974d7

gcc/ChangeLog:

        * config/aarch64/aarch64.c (aarch64_adjust_generic_arch_tuning): Define.
        (aarch64_override_options_internal): Use it.
        (generic_tunings): Add AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS to
        tune_flags.

gcc/testsuite/ChangeLog:

        * g++.target/aarch64/sve/aarch64-sve.exp: Add -moverride=tune=none to
        sve_flags.
        * g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
        * g++.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise.
        * gcc.target/aarch64/sve/aarch64-sve.exp: Likewise.
        * gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
        * gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise.

Attachment: generic-sve.patch
Description: generic-sve.patch

Reply via email to