On Sun, Nov 08, 2020 at 12:40:14PM -0500, Arvind Sankar wrote: > On Fri, Nov 06, 2020 at 07:14:36AM +0200, Adrian Ratiu wrote: > > Due to a Clang bug [1] neon autoloop vectorization does not happen or > > happens badly with no gains and considering previous GCC experiences > > which generated unoptimized code which was worse than the default asm > > implementation, it is safer to default clang builds to the known good > > generic implementation. > > > > The kernel currently supports a minimum Clang version of v10.0.1, see > > commit 1f7a44f63e6c ("compiler-clang: add build check for clang 10.0.1"). > > > > When the bug gets eventually fixed, this commit could be reverted or, > > if the minimum clang version bump takes a long time, a warning could > > be added for users to upgrade their compilers like was done for GCC. > > > > [1] https://bugs.llvm.org/show_bug.cgi?id=40976 > > > > Signed-off-by: Adrian Ratiu <adrian.ra...@collabora.com> > > --- > > arch/arm/include/asm/xor.h | 3 ++- > > arch/arm/lib/Makefile | 3 +++ > > arch/arm/lib/xor-neon.c | 4 ++++ > > 3 files changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/arch/arm/include/asm/xor.h b/arch/arm/include/asm/xor.h > > index aefddec79286..49937dafaa71 100644 > > --- a/arch/arm/include/asm/xor.h > > +++ b/arch/arm/include/asm/xor.h > > @@ -141,7 +141,8 @@ static struct xor_block_template xor_block_arm4regs = { > > NEON_TEMPLATES; \ > > } while (0) > > > > -#ifdef CONFIG_KERNEL_MODE_NEON > > +/* disabled on clang/arm due to > > https://bugs.llvm.org/show_bug.cgi?id=40976 */ > > +#if defined(CONFIG_KERNEL_MODE_NEON) && !defined(CONFIG_CC_IS_CLANG) > > > > extern struct xor_block_template const xor_block_neon_inner; > > > > diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile > > index 6d2ba454f25b..53f9e7dd9714 100644 > > --- a/arch/arm/lib/Makefile > > +++ b/arch/arm/lib/Makefile > > @@ -43,8 +43,11 @@ endif > > $(obj)/csumpartialcopy.o: $(obj)/csumpartialcopygeneric.S > > $(obj)/csumpartialcopyuser.o: $(obj)/csumpartialcopygeneric.S > > > > +# disabled on clang/arm due to https://bugs.llvm.org/show_bug.cgi?id=40976 > > +ifndef CONFIG_CC_IS_CLANG > > ifeq ($(CONFIG_KERNEL_MODE_NEON),y) > > NEON_FLAGS := -march=armv7-a -mfloat-abi=softfp > > -mfpu=neon > > CFLAGS_xor-neon.o += $(NEON_FLAGS) > > obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o > > endif > > +endif > > diff --git a/arch/arm/lib/xor-neon.c b/arch/arm/lib/xor-neon.c > > index e1e76186ec23..84c91c48dfa2 100644 > > --- a/arch/arm/lib/xor-neon.c > > +++ b/arch/arm/lib/xor-neon.c > > @@ -18,6 +18,10 @@ MODULE_LICENSE("GPL"); > > * Pull in the reference implementations while instructing GCC (through > > * -ftree-vectorize) to attempt to exploit implicit parallelism and emit > > * NEON instructions. > > + > > + * On Clang the loop vectorizer is enabled by default, but due to a bug > > + * (https://bugs.llvm.org/show_bug.cgi?id=40976) vectorization is broke > > + * so xor-neon is disabled in favor of the default reg implementations. > > */ > > #ifdef CONFIG_CC_IS_GCC > > #pragma GCC optimize "tree-vectorize" > > -- > > 2.29.0 > > > > It's actually a bad idea to use #pragma GCC optimize. This is basically > the same as tagging all the functions with __attribute__((optimize)), > which GCC does not recommend for production use, as it _replaces_ > optimization options rather than appending to them, and has been > observed to result in dropping important compiler flags. > > There've been a few discussions recently around other such cases: > https://lore.kernel.org/lkml/20201028171506.15682-1-a...@kernel.org/ > https://lore.kernel.org/lkml/20201028081123.gt2...@hirez.programming.kicks-ass.net/ > > For this file, given that it is supposed to use -ftree-vectorize for the > whole file anyway, is there any reason it's not just added to CFLAGS via > the Makefile? This seems to be the only use of pragma optimize in the > kernel.
Eg, this shows that the pragma results in dropping -fno-strict-aliasing. https://godbolt.org/z/1nfrKT The first function does not use vectorization because s and s->a might alias.