[PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
> GCC 10 appears to have changed -O2 in order to make compilation time faster when using -flto, seemingly at the expense of performance, in particular with regards to how the inliner works. Since -O3 these days shouldn't have the same set of bugs as 10 years ago, this commit defaults new kernel compiles to -O3 when using gcc >= 10. It's a strong "no" from me. 1) Aside from rare Gentoo users no one has extensively tested -O3 with the kernel - even Gentoo defaults to -O2 for kernel compilation 2) -O3 _always_ bloats the code by a large amount which means both vmlinux/bzImage and modules will become bigger, and slower to load from the disk 3) -O3 does _not_ necessarily makes the code run faster 4) If GCC10 has removed certain options for the -O2 optimization level you could just readded them as compilation flags without forcing -O3 by default on everyone 5) If you still insist on -O3 I guess everyone would be happy if you just made two KConfig options: OPTIMIZE_O2 (-O2) OPTIMIZE_O3_EVEN_MOAR (-O3) Best regards, Artem
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Sun, May 10, 2020 at 9:47 PM David Laight wrote: > > From: Joe Perches > > Sent: 08 May 2020 16:06 > > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote: > > > Personally, I'm more interested in improving compile speed of the kernel > > > > Any opinion on precompiled header support? > > When ever I've been anywhere near it it is always a disaster. > It may make sense for C++ where there is lots of complicated > code to parse in .h files. Parsing C headers is usually easier. > > One this I have done that significantly speeds up .h file > processing is to take the long list of '-I directory' parameters > that are passed to the compiler and copy the first version > of each file into a separate 'object headers' directory. > This saves the compiler doing lots of 'failed opens'. > > If each fragment makefile lists its 'public' headers make > can generate dependency rules that do the copies. > > FWIW make is much faster if you delete all the builtin and > suffix rules and rely on explicit rules for each file. Kbuild disables Make's builtin rules at least. # Do not use make's built-in rules and variables # (this increases performance and avoids hard-to-debug behaviour) MAKEFLAGS += -rR -- Best Regards Masahiro Yamada
RE: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
From: Joe Perches > Sent: 10 May 2020 18:45 > > On Sun, 2020-05-10 at 12:47 +, David Laight wrote: > > From: Joe Perches > > > Sent: 08 May 2020 16:06 > > > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote: > > > > Personally, I'm more interested in improving compile speed of the kernel > > > > > > Any opinion on precompiled header support? > > > > When ever I've been anywhere near it it is always a disaster. > > A disaster? Why? The only time I've had systems that used them they always got out of step with the headers - probable due to #define changes. If auto-generated by the compiler then parallel makes also give problems. > For a large commercial c only project, it worked well > by reducing a combined multi-include file, similar to > kernel.h here, to a single file. Certainly reducing the number of directories searched can make a big difference. I've also compiled .so by merging all the sources into a single file. > That was before SSDs though and the file open times > might have been rather larger then. The real killer is lots of directory names in the -I especially over NFS. I've also looked at system call stats during a kernel compile. open() dominated and my 'gut feeling' was that most were failing opens. I also suspect that modern compilers remember that an include file contained an include guard - and don't even both looking for it a second time. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Sun, 2020-05-10 at 12:47 +, David Laight wrote: > From: Joe Perches > > Sent: 08 May 2020 16:06 > > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote: > > > Personally, I'm more interested in improving compile speed of the kernel > > > > Any opinion on precompiled header support? > > When ever I've been anywhere near it it is always a disaster. A disaster? Why? For a large commercial c only project, it worked well by reducing a combined multi-include file, similar to kernel.h here, to a single file. That was before SSDs though and the file open times might have been rather larger then.
RE: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
From: Joe Perches > Sent: 08 May 2020 16:06 > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote: > > Personally, I'm more interested in improving compile speed of the kernel > > Any opinion on precompiled header support? When ever I've been anywhere near it it is always a disaster. It may make sense for C++ where there is lots of complicated code to parse in .h files. Parsing C headers is usually easier. One this I have done that significantly speeds up .h file processing is to take the long list of '-I directory' parameters that are passed to the compiler and copy the first version of each file into a separate 'object headers' directory. This saves the compiler doing lots of 'failed opens'. If each fragment makefile lists its 'public' headers make can generate dependency rules that do the copies. FWIW make is much faster if you delete all the builtin and suffix rules and rely on explicit rules for each file. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Fri, May 8, 2020 at 5:06 PM Joe Perches wrote: > > On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote: > > Personally, I'm more interested in improving compile speed of the kernel > > Any opinion on precompiled header support? I have not tried it. IIRC precompiled headers usually work best for projects that have a large header with all the global declarations that gets included everywhere, while Linux has always tried (with different amounts of success) to minimize the number of headers that get included per file. Arnd
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Fri, 2020-05-08 at 13:49 +0200, Arnd Bergmann wrote: > Personally, I'm more interested in improving compile speed of the kernel Any opinion on precompiled header support?
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Fri, May 8, 2020 at 2:07 PM Jason A. Donenfeld wrote: > On Fri, May 8, 2020 at 5:56 AM Arnd Bergmann wrote: > > The other significant thing -- and what prompted this patchset -- is > it looks like gcc 10 has lowered the inlining degree for -O2, and put > gcc 9's inlining parameters from -O2 into gcc-10's -O3. I suspect it is more complicated than that, as there are a number of parameters that determine inlining decisions. It's also not clear whether the ones for -O3 are generally better than the ones with -O2, or if it's just that whatever changed caused a few surprises but is otherwise preferable. Did you see regressions in specific modules, or just a general slowdown or growth in object size as the result? Arnd
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Fri, May 8, 2020 at 5:56 AM Arnd Bergmann wrote: > > On Fri, May 8, 2020 at 1:33 PM Oleksandr Natalenko > wrote: > > > > On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote: > > > > Should we untangle -O3 from depending on ARC first maybe? > > > > > > Oh, hah, good point. Yes, I'll do that for a v2, but will wait another > > > day for feedback first. > > > > Just keep in mind that my previous attempt [1] failed because of too > > many false positive warnings despite -O3 really uncovered a couple of > > bugs in the codebase. > > I think my warning fixes were mostly picked up in the meantime, but > if there are any remaining, they would be mixed in with random other > fixes in my testing tree, so it's hard to know for sure. > > I also want to hear the feedback from the gcc developers about what > the general recommendations are between O2 and O3, and how > they may have changed over time. According to the gcc-10 documentation, > the difference between -O2 and -O3 is exactly this set of options: > > -fgcse-after-reload > -fipa-cp-clone > -floop-interchange > -floop-unroll-and-jam > -fpeel-loops > -fpredictive-commoning > -fsplit-loops > -fsplit-paths > -ftree-loop-distribution > -ftree-loop-vectorize > -ftree-partial-pre > -ftree-slp-vectorize > -funswitch-loops > -fvect-cost-model > -fvect-cost-model=dynamic > -fversion-loops-for-strides The other significant thing -- and what prompted this patchset -- is it looks like gcc 10 has lowered the inlining degree for -O2, and put gcc 9's inlining parameters from -O2 into gcc-10's -O3.
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Fri, May 8, 2020 at 1:33 PM Oleksandr Natalenko wrote: > > On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote: > > > Should we untangle -O3 from depending on ARC first maybe? > > > > Oh, hah, good point. Yes, I'll do that for a v2, but will wait another > > day for feedback first. > > Just keep in mind that my previous attempt [1] failed because of too > many false positive warnings despite -O3 really uncovered a couple of > bugs in the codebase. I think my warning fixes were mostly picked up in the meantime, but if there are any remaining, they would be mixed in with random other fixes in my testing tree, so it's hard to know for sure. I also want to hear the feedback from the gcc developers about what the general recommendations are between O2 and O3, and how they may have changed over time. According to the gcc-10 documentation, the difference between -O2 and -O3 is exactly this set of options: -fgcse-after-reload -fipa-cp-clone -floop-interchange -floop-unroll-and-jam -fpeel-loops -fpredictive-commoning -fsplit-loops -fsplit-paths -ftree-loop-distribution -ftree-loop-vectorize -ftree-partial-pre -ftree-slp-vectorize -funswitch-loops -fvect-cost-model -fvect-cost-model=dynamic -fversion-loops-for-strides It's a relatively short list, so someone familiar with the options could perhaps look into whether we want to change the default for all of them, or if it makes sense to be more selective. Personally, I'm more interested in improving compile speed of the kernel and eventually supporting -Og or some variant of it for my own build testing, but of course I also want to make sure that the other optimization levels do not produce warnings, and -Og leads to more problems than -O3 at the moment. Arnd
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Fri, May 08, 2020 at 05:21:47AM -0600, Jason A. Donenfeld wrote: > > Should we untangle -O3 from depending on ARC first maybe? > > Oh, hah, good point. Yes, I'll do that for a v2, but will wait another > day for feedback first. Just keep in mind that my previous attempt [1] failed because of too many false positive warnings despite -O3 really uncovered a couple of bugs in the codebase. Lets hope your attempt will be more successfull. I'll happily offer my review tag ;). Also Cc'ing Andrew who (IIRC) tried to took my sumbission and Arnd who tried to clean up the mess afterwards. [1] https://lore.kernel.org/lkml/20191211104619.114557-1-oleksa...@redhat.com/ -- Best regards, Oleksandr Natalenko (post-factum) Principal Software Maintenance Engineer
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Fri, May 8, 2020 at 3:08 AM Oleksandr Natalenko wrote: > > Should we untangle -O3 from depending on ARC first maybe? Oh, hah, good point. Yes, I'll do that for a v2, but will wait another day for feedback first. Jason
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Thu, May 07, 2020 at 04:45:30PM -0600, Jason A. Donenfeld wrote: > GCC 10 appears to have changed -O2 in order to make compilation time > faster when using -flto, seemingly at the expense of performance, in > particular with regards to how the inliner works. Since -O3 these days > shouldn't have the same set of bugs as 10 years ago, this commit > defaults new kernel compiles to -O3 when using gcc >= 10. > > Signed-off-by: Jason A. Donenfeld > --- > init/Kconfig | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 9e22ee8fbd75..fab3f810a68d 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1245,7 +1245,8 @@ config BOOT_CONFIG > > choice > prompt "Compiler optimization level" > - default CC_OPTIMIZE_FOR_PERFORMANCE > + default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 10 > + default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 10 || > CC_IS_CLANG) > > config CC_OPTIMIZE_FOR_PERFORMANCE > bool "Optimize for performance (-O2)" > -- > 2.26.2 > Should we untangle -O3 from depending on ARC first maybe? -- Best regards, Oleksandr Natalenko (post-factum) Principal Software Maintenance Engineer
Re: [PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
On Thu, May 07, 2020 at 04:45:30PM -0600, Jason A. Donenfeld wrote: > GCC 10 appears to have changed -O2 in order to make compilation time > faster when using -flto, seemingly at the expense of performance, in > particular with regards to how the inliner works. Since -O3 these days > shouldn't have the same set of bugs as 10 years ago, this commit > defaults new kernel compiles to -O3 when using gcc >= 10. Would be nice to get some GCC person's feedback on this. But in general, I think you're right in that O3 isn't the code-gen disaster it used to be. > Signed-off-by: Jason A. Donenfeld > --- > init/Kconfig | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/init/Kconfig b/init/Kconfig > index 9e22ee8fbd75..fab3f810a68d 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1245,7 +1245,8 @@ config BOOT_CONFIG > > choice > prompt "Compiler optimization level" > - default CC_OPTIMIZE_FOR_PERFORMANCE > + default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 10 > + default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 10 || > CC_IS_CLANG) > > config CC_OPTIMIZE_FOR_PERFORMANCE > bool "Optimize for performance (-O2)" > -- > 2.26.2 >
[PATCH] Kconfig: default to CC_OPTIMIZE_FOR_PERFORMANCE_O3 for gcc >= 10
GCC 10 appears to have changed -O2 in order to make compilation time faster when using -flto, seemingly at the expense of performance, in particular with regards to how the inliner works. Since -O3 these days shouldn't have the same set of bugs as 10 years ago, this commit defaults new kernel compiles to -O3 when using gcc >= 10. Signed-off-by: Jason A. Donenfeld --- init/Kconfig | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/init/Kconfig b/init/Kconfig index 9e22ee8fbd75..fab3f810a68d 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1245,7 +1245,8 @@ config BOOT_CONFIG choice prompt "Compiler optimization level" - default CC_OPTIMIZE_FOR_PERFORMANCE + default CC_OPTIMIZE_FOR_PERFORMANCE_O3 if GCC_VERSION >= 10 + default CC_OPTIMIZE_FOR_PERFORMANCE if (GCC_VERSION < 10 || CC_IS_CLANG) config CC_OPTIMIZE_FOR_PERFORMANCE bool "Optimize for performance (-O2)" -- 2.26.2