> -----Original Message----- > From: Kyrylo Tkachov > Sent: 18 March 2021 09:37 > To: 'qia...@fujitsu.com' <qia...@fujitsu.com>; gcc-patches@gcc.gnu.org > Cc: Richard Sandiford <richard.sandif...@arm.com> > Subject: RE: [PATCH] aarch64: Improve generic SVE tuning defaults > > > > > -----Original Message----- > > From: qia...@fujitsu.com <qia...@fujitsu.com> > > Sent: 18 March 2021 01:52 > > To: Kyrylo Tkachov <kyrylo.tkac...@arm.com>; gcc- > patc...@gcc.gnu.org > > Cc: Richard Sandiford <richard.sandif...@arm.com> > > Subject: RE: [PATCH] aarch64: Improve generic SVE tuning defaults > > > > Hello Kyrill, > > > > Sorry for the slow response. > > The performance on a64fx is not impacted with this patch. > > Thank you very much for testing Qian. > Glad to see there is no impact on A64FX. I will push the patch to master then.
I should say, I intend to backport this to GCC 10 as well as it has the same effect on that branch (helps Neoverse V1, no effect on anything else). Will do so after a bit more testing, the patch applies cleanly. Thanks, Kyrill > Kyrill > > > > > Regards, > > Qian > > > > > -----Original Message----- > > > From: Kyrylo Tkachov <kyrylo.tkac...@arm.com> > > > Sent: Wednesday, March 10, 2021 10:56 PM > > > To: gcc-patches@gcc.gnu.org > > > Cc: Richard Sandiford <richard.sandif...@arm.com>; Qian, Jianhua/钱 > 建 > > 华 > > > <qia...@fujitsu.com> > > > Subject: [PATCH] aarch64: Improve generic SVE tuning defaults > > > > > > Hi all, > > > > > > This patch adds the recently-added tweak to split some SVE VL-based > scalar > > > operations [1] to the generic tuning used for SVE, as enabled by adding > > +sve to > > > the -march flag, for example -march=armv8.2-a+sve. > > > > > > The recommendation for best performance on a particular CPU remains > > > unchanged: > > > use the -mcpu option for that CPU, where possible. -mcpu=native makes > > this > > > straightforward for native compilation. > > > > > > The tweak to split out SVE VL-based scalar operations is a consistent win > > for > > > the Neoverse V1 CPU and should be neutral for the Fujitsu A64FX. A run > of > > > SPEC2017 on A64FX with this tweak on didn't show any non-noise > > differences. > > > It is also expected to be neutral on SVE2 implementations. > > > > > > Therefore, the patch enables the tweak for generic +sve tuning e.g. > > > -march=armv8.2-a+sve. No SVE2 CPUs are expected to benefit from it, > > > therefore the tweak is disabled for generic tuning when +sve2 is in - > march > > e.g. > > > -march=armv8.2-a+sve2. > > > > > > The implementation of this approach requires a bit of custom logic in > > > aarch64_override_options_internal to handle these kinds of > > > architecture-dependent decisions, but we do believe the user-facing > > principle > > > here is important to implement. > > > > > > Qian, as you've contributed the A64FX support to GCC, I would be > grateful > > for > > > your feedback on this approach and in particular on the performance > > evaluation > > > of this change. > > > > > > In general, for the generic target we're using a decision framework that > > looks > > > like: > > > > > > * If all cores that are known to benefit from an optimization are of > > architecture X, > > > and all other cores that implement X or above are not impacted, or have > a > > very > > > slight impact, we will consider it for generic tuning for architecture X. > > > * We will not enable that optimisation for generic tuning for architecture > > X+1 if > > > no known cores of architecture X+1 or above will benefit. > > > > > > This framework allows us to improve generic tuning for CPUs of > generation > > X > > > while avoiding accumulating tweaks for future CPUs of generation X+1, > > X+2... > > > that do not need them, and thus avoid even the slight negative effects of > > these > > > optimisations if the user is willing to tell us the desired architecture > > accurately. > > > > > > X above can mean either annual architecture updates (Armv8.2-a, > Armv8.3- > > a > > > etc) or optional architecture extensions (like SVE, SVE2). > > > > > > We think that this patch fits that framework, so would like to propose it > for > > the > > > trunk default tunings for SVE. > > > > > > Bootstrapped and tested on aarch64-none-linux-gnu. > > > > > > Thanks, > > > Kyrill > > > > > > [1] http://gcc.gnu.org/g:a65b9ad863c5fc0aea12db58557f4d286a1974d7 > > > > > > gcc/ChangeLog: > > > > > > * config/aarch64/aarch64.c (aarch64_adjust_generic_arch_tuning): > > > Define. > > > (aarch64_override_options_internal): Use it. > > > (generic_tunings): Add > > > AARCH64_EXTRA_TUNE_CSE_SVE_VL_CONSTANTS to > > > tune_flags. > > > > > > gcc/testsuite/ChangeLog: > > > > > > * g++.target/aarch64/sve/aarch64-sve.exp: Add > > > -moverride=tune=none to > > > sve_flags. > > > * g++.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. > > > * g++.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise. > > > * gcc.target/aarch64/sve/aarch64-sve.exp: Likewise. > > > * gcc.target/aarch64/sve/acle/aarch64-sve-acle-asm.exp: Likewise. > > > * gcc.target/aarch64/sve/acle/aarch64-sve-acle.exp: Likewise. > > >