Hello Kyrill,

As you said, this patch is only effective for generic SVE tuning.
So, I will evaluate performance without -mcpu option on a64fx.
I'll tell you the result next week.

Regards,
Qian

> -----Original Message-----
> From: Kyrylo Tkachov <kyrylo.tkac...@arm.com>
> Sent: Wednesday, March 10, 2021 10:56 PM
> To: gcc-patches@gcc.gnu.org
> Cc: Richard Sandiford <richard.sandif...@arm.com>; Qian, Jianhua/钱 建华
> <qia...@fujitsu.com>
> Subject: [PATCH] aarch64: Improve generic SVE tuning defaults
> 
> Hi all,
> 
> This patch adds the recently-added tweak to split some SVE VL-based scalar
> operations [1] to the generic tuning used for SVE, as enabled by adding +sve 
> to
> the -march flag, for example -march=armv8.2-a+sve.
> 
> The recommendation for best performance on a particular CPU remains
> unchanged:
> use the -mcpu option for that CPU, where possible. -mcpu=native makes this
> straightforward for native compilation.
> 
> The tweak to split out SVE VL-based scalar operations is a consistent win for
> the Neoverse V1 CPU and should be neutral for the Fujitsu A64FX. A run of
> SPEC2017 on A64FX with this tweak on didn't show any non-noise differences.
> It is also expected to be neutral on SVE2 implementations.
> 
> Therefore, the patch enables the tweak for generic +sve tuning e.g.
> -march=armv8.2-a+sve. No SVE2 CPUs are expected to benefit from it,
> therefore the tweak is disabled for generic tuning when +sve2 is in -march 
> e.g.
> -march=armv8.2-a+sve2.
> 
> The implementation of this approach requires a bit of custom logic in
> aarch64_override_options_internal to handle these kinds of
> architecture-dependent decisions, but we do believe the user-facing principle
> here is important to implement.
> 
> Qian, as you've contributed the A64FX support to GCC, I would be grateful for
> your feedback on this approach and in particular on the performance evaluation
> of this change.
> 
> In general, for the generic target we're using a decision framework that looks
> like:
> 
> * If all cores that are known to benefit from an optimization are of 
> architecture X,
> and all other cores that implement X or above are not impacted, or have a very
> slight impact, we will consider it for generic tuning for architecture X.
> * We will not enable that optimisation for generic tuning for architecture 
> X+1 if
> no known cores of architecture X+1 or above will benefit.
> 
> This framework allows us to improve generic tuning for CPUs of generation X
> while avoiding accumulating tweaks for future CPUs of generation X+1, X+2...
> that do not need them, and thus avoid even the slight negative effects of 
> these
> optimisations if the user is willing to tell us the desired architecture 
> accurately.
> 
> X above can mean either annual architecture updates (Armv8.2-a, Armv8.3-a
> etc) or optional architecture extensions (like SVE, SVE2).
> 
> We think that this patch fits that framework, so would like to propose it for 
> the
> trunk default tunings for SVE.
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Thanks,
> Kyrill
> 
> [1] http://gcc.gnu.org/g:a65b9ad863c5fc0aea12db58557f4d286a1974d7
> 
> gcc/ChangeLog:
> 
>       * config/aarch64/aarch64.c (aarch64_adjust_generic_arch_tuning):
> Define.
>       (aarch64_override_options_internal): Use it.
>       (generic_tunings): Add
> AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS to
>       tune_flags.
> 
> gcc/testsuite/ChangeLog:
> 
>       * g++.target/aarch64/sve/aarch64-sve.exp: Add
> -moverride=tune=none to
>       sve_flags.
>       * g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
>       * g++.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise.
>       * gcc.target/aarch64/sve/aarch64-sve.exp: Likewise.
>       * gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise.
>       * gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise.
> 

Reply via email to