On Wed, 27 May 2020 at 09:22, Will Deacon <w...@kernel.org> wrote: > > On Wed, May 27, 2020 at 01:10:00AM +0200, Arnd Bergmann wrote: > > On Tue, May 26, 2020 at 9:00 PM Arnd Bergmann <a...@arndb.de> wrote: > > > > > > On Tue, May 26, 2020 at 7:33 PM 'Marco Elver' via Clang Built Linux > > > <clang-built-li...@googlegroups.com> wrote: > > > > On Tue, 26 May 2020, Marco Elver wrote: > > > > > On Tue, 26 May 2020 at 14:19, Arnd Bergmann <a...@arndb.de> wrote: > > > > > Note that an 'allyesconfig' selects KASAN and not KCSAN by default. > > > > > But I think that's not relevant, since KCSAN-specific code was removed > > > > > from ONCEs. In general though, it is entirely expected that we have a > > > > > bit longer compile times when we have the instrumentation passes > > > > > enabled. > > > > > > > > > > But as you pointed out, that's irrelevant, and the significant > > > > > overhead is from parsing and pre-processing. FWIW, we can probably > > > > > optimize Clang itself a bit: > > > > > https://github.com/ClangBuiltLinux/linux/issues/1032#issuecomment-633712667 > > > > > > > > Found that optimizing __unqual_scalar_typeof makes a noticeable > > > > difference. We could use C11's _Generic if the compiler supports it (and > > > > all supported versions of Clang certainly do). > > > > > > > > Could you verify if the below patch improves compile-times for you? E.g. > > > > on fs/ocfs2/journal.c I was able to get ~40% compile-time speedup. > > > > > > Yes, that brings both the preprocessed size and the time to preprocess it > > > with clang-11 back to where it is in mainline, and close to the speed with > > > gcc-10 for this particular file. > > > > > > I also cross-checked with gcc-4.9 and gcc-10 and found that they do see > > > the same increase in the preprocessor output, but it makes little > > > difference > > > for preprocessing performance on gcc. > > > > Just for reference, I've tested this against a patch I made that completely > > shortcuts READ_ONCE() on anything but alpha (which needs the > > read_barrier_depends()): > > > > --- a/include/linux/compiler.h > > +++ b/include/linux/compiler.h > > @@ -224,18 +224,21 @@ void ftrace_likely_update(struct > > ftrace_likely_data *f, int val, > > * atomicity or dependency ordering guarantees. Note that this may result > > * in tears! > > */ > > -#define __READ_ONCE(x) (*(const volatile __unqual_scalar_typeof(x) *)&(x)) > > +#define __READ_ONCE(x) (*(const volatile typeof(x) *)&(x)) > > > > +#ifdef CONFIG_ALPHA /* smp_read_barrier_depends is a NOP otherwise */ > > #define __READ_ONCE_SCALAR(x) \ > > ({ \ > > __unqual_scalar_typeof(x) __x = __READ_ONCE(x); \ > > smp_read_barrier_depends(); \ > > - (typeof(x))__x; \ > > + __x; \ > > }) > > +#else > > +#define __READ_ONCE_SCALAR(x) __READ_ONCE(x) > > +#endif > > Nice! FWIW, I'm planning to have Alpha override __READ_ONCE_SCALAR() > eventually, so that smp_read_barrier_depends() can disappear forever. I > just bit off more than I can chew for 5.8 :( > > However, '__unqual_scalar_typeof()' is still useful for > load-acquire/store-release on arm64, so we still need a better solution to > the build-time regression imo. I'm not fond of picking random C11 features > to accomplish that, but I also don't have any better ideas...
We already use _Static_assert in the kernel, so it's not the first use of a C11 feature. > Is there any mileage in the clever trick from Rasmus? > > https://lore.kernel.org/r/6cbc8ae1-8eb1-a5a0-a584-2081fca1c...@rasmusvillemoes.dk Apparently that one only works with GCC 7 or newer, and is only properly defined behaviour since C11. It also relies on multiple _Pragma. I'd probably take the arguably much cleaner _Generic solution over that. ;-) I think given that Peter and Arnd already did some testing, and it works as intended, if you don't mind, I'll send a patch for the _Generic version. At least that'll give us a more optimized __unqual_scalar_typeof(). Any further optimizations to READ_ONCE() like you mentioned then become a little less urgent. Thanks, -- Marco