Re: RFC: Improving GCC8 default option settings
Hi, On Thu, Sep 14, 2017 at 11:55:21AM +0200, Richard Biener wrote: > On Wed, Sep 13, 2017 at 5:08 PM, Allan Sandfeld Jensen > wrote: > > On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote: > >> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: > >> > On its own -O3 doesn't add much (some loop opts and slightly more > >> > aggressive inlining/unrolling), so whatever it does we > >> > should consider doing at -O2 eventually. > >> > >> Well, -O3 adds vectorization, which we don't enable at -O2 by default. > >> > > Would it be possible to enable basic block vectorization on -O2? I assume > > that > > doesn't increase binary size since it doesn't unroll loops. > > Somebody needs to provide benchmarking looking at the compile-time cost > vs. the runtime benefit and the code size effect. There's also room to tune > aggressiveness of BB vectorization as it currently allows for cases where > the scalar computation is not fully replaced by vector code. > A good candidate too look at might be 525.x264_r from the SPEC2017 CPU suite. With just -O2, GCC is about 70% slower than LLVM (which I think must be doing some vectorization at -O2). When I give -O2 -ftree-vectorize to gcc, the difference drops to 20%, so vectorization is not the whole story either. There is no real difference in run-time of executables generated with both compilers at -Ofast. (But no, I'm not volunteering to analyze it further in foreseeable future.) Martin
Re: RFC: Improving GCC8 default option settings
On Thu, Sep 14, 2017 at 3:08 PM, Markus Trippelsdorf wrote: > On 2017.09.14 at 14:48 +0200, Richard Biener wrote: >> On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška wrote: >> > On 09/14/2017 12:37 PM, Bin.Cheng wrote: >> >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener >> >> wrote: >> >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: >> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: >> > On 2017.09.14 at 11:57 +0200, Richard Biener wrote: >> >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras >> >> wrote: >> >>> On 12/09/17 16:57, Wilco Dijkstra wrote: >> >> [...] As a result users are >> required to enable several additional optimizations by hand to get >> good >> code. >> Other compilers enable more optimizations at -O2 (loop unrolling in >> LLVM >> was >> mentioned repeatedly) which GCC could/should do as well. >> [...] >> >> I'd welcome discussion and other proposals for similar improvements. >> >>> >> >>> >> >>> What's the status of graphite? It's been around for years. Isn't it >> >>> mature >> >>> enough to enable these: >> >>> >> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine >> >>> -floop-block >> >>> >> >>> by default for -O2? (And I'm not even sure those are the complete >> >>> set of >> >>> graphite optimization flags, or just the "useful" ones.) >> >> >> >> It's not on by default at any optimization level. The main issue is >> >> the >> >> lack of maintainance and a set of known common internal compiler >> >> errors >> >> we hit. The other issue is that there's no benefit of turning those >> >> on for >> >> SPEC CPU benchmarking as far as I remember but quite a bit of extra >> >> compile-time cost. >> > >> > Not to mention the numerous wrong-code bugs. IMHO graphite should >> > deprecated as soon as possible. >> > >> >> For wrong-code bugs we've got and I recently went through, I fully >> agree with this >> approach and I would do it for GCC 8. There are PRs where order of >> simple 2 loops >> is changed, causing wrong-code as there's a data dependence. >> >> Moreover, I know that Bin was thinking about selection whether to use >> classical loop >> optimizations or Graphite (depending on options provided). This would >> simplify it ;) >> >>> >> >>> I don't think removing graphite is warranted, I still think it is the >> >>> approach to use when >> >>> handling non-perfect nests. >> >> Hi, >> >> IMHO, we should not be in a hurry to remove graphite, though we are >> >> introducing some traditional transformations. It's a quite standalone >> >> part in GCC and supports more transformations. Also as it gets more >> >> attention, never know if somebody will find time to work on it. >> > >> > Ok. I just wanted to express that from user's perspective I would not >> > recommend it to use. >> > Even if it improves some interesting (and for classical loop optimization >> > hard) loop nests, >> > it can still blow up on a quite simple data dependence in between loops. >> > That said, it's quite >> > risky to use it. >> >> We only have a single wrong-code bug in bugzilla with a testcase and I >> just fixed it (well, >> patch in testing). We do have plenty of ICEs, yes. > > Even tramp3d-v4, which is cited in several graphite papers, gets > miscompiled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823. But unfortunately there isn't a self-contained testcase for that. The comments hint at sth like int a[][]; p = &a[1][0]; for(;;) a[i][j] = ... p[i] = ... would get at it, that is, accessing memory via two-dim array and pointer. Richard. > -- > Markus
Re: RFC: Improving GCC8 default option settings
On 2017.09.14 at 14:48 +0200, Richard Biener wrote: > On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška wrote: > > On 09/14/2017 12:37 PM, Bin.Cheng wrote: > >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener > >> wrote: > >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: > On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: > > On 2017.09.14 at 11:57 +0200, Richard Biener wrote: > >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras > >> wrote: > >>> On 12/09/17 16:57, Wilco Dijkstra wrote: > > [...] As a result users are > required to enable several additional optimizations by hand to get > good > code. > Other compilers enable more optimizations at -O2 (loop unrolling in > LLVM > was > mentioned repeatedly) which GCC could/should do as well. > [...] > > I'd welcome discussion and other proposals for similar improvements. > >>> > >>> > >>> What's the status of graphite? It's been around for years. Isn't it > >>> mature > >>> enough to enable these: > >>> > >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine > >>> -floop-block > >>> > >>> by default for -O2? (And I'm not even sure those are the complete set > >>> of > >>> graphite optimization flags, or just the "useful" ones.) > >> > >> It's not on by default at any optimization level. The main issue is > >> the > >> lack of maintainance and a set of known common internal compiler errors > >> we hit. The other issue is that there's no benefit of turning those > >> on for > >> SPEC CPU benchmarking as far as I remember but quite a bit of extra > >> compile-time cost. > > > > Not to mention the numerous wrong-code bugs. IMHO graphite should > > deprecated as soon as possible. > > > > For wrong-code bugs we've got and I recently went through, I fully agree > with this > approach and I would do it for GCC 8. There are PRs where order of > simple 2 loops > is changed, causing wrong-code as there's a data dependence. > > Moreover, I know that Bin was thinking about selection whether to use > classical loop > optimizations or Graphite (depending on options provided). This would > simplify it ;) > >>> > >>> I don't think removing graphite is warranted, I still think it is the > >>> approach to use when > >>> handling non-perfect nests. > >> Hi, > >> IMHO, we should not be in a hurry to remove graphite, though we are > >> introducing some traditional transformations. It's a quite standalone > >> part in GCC and supports more transformations. Also as it gets more > >> attention, never know if somebody will find time to work on it. > > > > Ok. I just wanted to express that from user's perspective I would not > > recommend it to use. > > Even if it improves some interesting (and for classical loop optimization > > hard) loop nests, > > it can still blow up on a quite simple data dependence in between loops. > > That said, it's quite > > risky to use it. > > We only have a single wrong-code bug in bugzilla with a testcase and I > just fixed it (well, > patch in testing). We do have plenty of ICEs, yes. Even tramp3d-v4, which is cited in several graphite papers, gets miscompiled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823. -- Markus
Re: RFC: Improving GCC8 default option settings
On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška wrote: > On 09/14/2017 12:37 PM, Bin.Cheng wrote: >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener >> wrote: >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: > On 2017.09.14 at 11:57 +0200, Richard Biener wrote: >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras >> wrote: >>> On 12/09/17 16:57, Wilco Dijkstra wrote: [...] As a result users are required to enable several additional optimizations by hand to get good code. Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should do as well. [...] I'd welcome discussion and other proposals for similar improvements. >>> >>> >>> What's the status of graphite? It's been around for years. Isn't it >>> mature >>> enough to enable these: >>> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine >>> -floop-block >>> >>> by default for -O2? (And I'm not even sure those are the complete set of >>> graphite optimization flags, or just the "useful" ones.) >> >> It's not on by default at any optimization level. The main issue is the >> lack of maintainance and a set of known common internal compiler errors >> we hit. The other issue is that there's no benefit of turning those on >> for >> SPEC CPU benchmarking as far as I remember but quite a bit of extra >> compile-time cost. > > Not to mention the numerous wrong-code bugs. IMHO graphite should > deprecated as soon as possible. > For wrong-code bugs we've got and I recently went through, I fully agree with this approach and I would do it for GCC 8. There are PRs where order of simple 2 loops is changed, causing wrong-code as there's a data dependence. Moreover, I know that Bin was thinking about selection whether to use classical loop optimizations or Graphite (depending on options provided). This would simplify it ;) >>> >>> I don't think removing graphite is warranted, I still think it is the >>> approach to use when >>> handling non-perfect nests. >> Hi, >> IMHO, we should not be in a hurry to remove graphite, though we are >> introducing some traditional transformations. It's a quite standalone >> part in GCC and supports more transformations. Also as it gets more >> attention, never know if somebody will find time to work on it. > > Ok. I just wanted to express that from user's perspective I would not > recommend it to use. > Even if it improves some interesting (and for classical loop optimization > hard) loop nests, > it can still blow up on a quite simple data dependence in between loops. That > said, it's quite > risky to use it. We only have a single wrong-code bug in bugzilla with a testcase and I just fixed it (well, patch in testing). We do have plenty of ICEs, yes. Richard. > Thanks, > Martin > >> >> Thanks, >> bin >>> >>> Richard. >>> Martin >
Re: RFC: Improving GCC8 default option settings
On 09/14/2017 12:37 PM, Bin.Cheng wrote: > On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener > wrote: >> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: >>> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: On 2017.09.14 at 11:57 +0200, Richard Biener wrote: > On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras > wrote: >> On 12/09/17 16:57, Wilco Dijkstra wrote: >>> >>> [...] As a result users are >>> required to enable several additional optimizations by hand to get good >>> code. >>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM >>> was >>> mentioned repeatedly) which GCC could/should do as well. >>> [...] >>> >>> I'd welcome discussion and other proposals for similar improvements. >> >> >> What's the status of graphite? It's been around for years. Isn't it >> mature >> enough to enable these: >> >> -floop-interchange -ftree-loop-distribution -floop-strip-mine >> -floop-block >> >> by default for -O2? (And I'm not even sure those are the complete set of >> graphite optimization flags, or just the "useful" ones.) > > It's not on by default at any optimization level. The main issue is the > lack of maintainance and a set of known common internal compiler errors > we hit. The other issue is that there's no benefit of turning those on > for > SPEC CPU benchmarking as far as I remember but quite a bit of extra > compile-time cost. Not to mention the numerous wrong-code bugs. IMHO graphite should deprecated as soon as possible. >>> >>> For wrong-code bugs we've got and I recently went through, I fully agree >>> with this >>> approach and I would do it for GCC 8. There are PRs where order of simple 2 >>> loops >>> is changed, causing wrong-code as there's a data dependence. >>> >>> Moreover, I know that Bin was thinking about selection whether to use >>> classical loop >>> optimizations or Graphite (depending on options provided). This would >>> simplify it ;) >> >> I don't think removing graphite is warranted, I still think it is the >> approach to use when >> handling non-perfect nests. > Hi, > IMHO, we should not be in a hurry to remove graphite, though we are > introducing some traditional transformations. It's a quite standalone > part in GCC and supports more transformations. Also as it gets more > attention, never know if somebody will find time to work on it. Ok. I just wanted to express that from user's perspective I would not recommend it to use. Even if it improves some interesting (and for classical loop optimization hard) loop nests, it can still blow up on a quite simple data dependence in between loops. That said, it's quite risky to use it. Thanks, Martin > > Thanks, > bin >> >> Richard. >> >>> Martin
Re: RFC: Improving GCC8 default option settings
On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener wrote: > On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: >> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: >>> On 2017.09.14 at 11:57 +0200, Richard Biener wrote: On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: > On 12/09/17 16:57, Wilco Dijkstra wrote: >> >> [...] As a result users are >> required to enable several additional optimizations by hand to get good >> code. >> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM >> was >> mentioned repeatedly) which GCC could/should do as well. >> [...] >> >> I'd welcome discussion and other proposals for similar improvements. > > > What's the status of graphite? It's been around for years. Isn't it mature > enough to enable these: > > -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block > > by default for -O2? (And I'm not even sure those are the complete set of > graphite optimization flags, or just the "useful" ones.) It's not on by default at any optimization level. The main issue is the lack of maintainance and a set of known common internal compiler errors we hit. The other issue is that there's no benefit of turning those on for SPEC CPU benchmarking as far as I remember but quite a bit of extra compile-time cost. >>> >>> Not to mention the numerous wrong-code bugs. IMHO graphite should >>> deprecated as soon as possible. >>> >> >> For wrong-code bugs we've got and I recently went through, I fully agree >> with this >> approach and I would do it for GCC 8. There are PRs where order of simple 2 >> loops >> is changed, causing wrong-code as there's a data dependence. >> >> Moreover, I know that Bin was thinking about selection whether to use >> classical loop >> optimizations or Graphite (depending on options provided). This would >> simplify it ;) > > I don't think removing graphite is warranted, I still think it is the > approach to use when > handling non-perfect nests. Hi, IMHO, we should not be in a hurry to remove graphite, though we are introducing some traditional transformations. It's a quite standalone part in GCC and supports more transformations. Also as it gets more attention, never know if somebody will find time to work on it. Thanks, bin > > Richard. > >> Martin
Re: RFC: Improving GCC8 default option settings
On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška wrote: > On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: >> On 2017.09.14 at 11:57 +0200, Richard Biener wrote: >>> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: On 12/09/17 16:57, Wilco Dijkstra wrote: > > [...] As a result users are > required to enable several additional optimizations by hand to get good > code. > Other compilers enable more optimizations at -O2 (loop unrolling in LLVM > was > mentioned repeatedly) which GCC could/should do as well. > [...] > > I'd welcome discussion and other proposals for similar improvements. What's the status of graphite? It's been around for years. Isn't it mature enough to enable these: -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block by default for -O2? (And I'm not even sure those are the complete set of graphite optimization flags, or just the "useful" ones.) >>> >>> It's not on by default at any optimization level. The main issue is the >>> lack of maintainance and a set of known common internal compiler errors >>> we hit. The other issue is that there's no benefit of turning those on for >>> SPEC CPU benchmarking as far as I remember but quite a bit of extra >>> compile-time cost. >> >> Not to mention the numerous wrong-code bugs. IMHO graphite should >> deprecated as soon as possible. >> > > For wrong-code bugs we've got and I recently went through, I fully agree with > this > approach and I would do it for GCC 8. There are PRs where order of simple 2 > loops > is changed, causing wrong-code as there's a data dependence. > > Moreover, I know that Bin was thinking about selection whether to use > classical loop > optimizations or Graphite (depending on options provided). This would > simplify it ;) I don't think removing graphite is warranted, I still think it is the approach to use when handling non-perfect nests. Richard. > Martin
Re: RFC: Improving GCC8 default option settings
On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote: > On 2017.09.14 at 11:57 +0200, Richard Biener wrote: >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: >>> On 12/09/17 16:57, Wilco Dijkstra wrote: [...] As a result users are required to enable several additional optimizations by hand to get good code. Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should do as well. [...] I'd welcome discussion and other proposals for similar improvements. >>> >>> >>> What's the status of graphite? It's been around for years. Isn't it mature >>> enough to enable these: >>> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block >>> >>> by default for -O2? (And I'm not even sure those are the complete set of >>> graphite optimization flags, or just the "useful" ones.) >> >> It's not on by default at any optimization level. The main issue is the >> lack of maintainance and a set of known common internal compiler errors >> we hit. The other issue is that there's no benefit of turning those on for >> SPEC CPU benchmarking as far as I remember but quite a bit of extra >> compile-time cost. > > Not to mention the numerous wrong-code bugs. IMHO graphite should > deprecated as soon as possible. > For wrong-code bugs we've got and I recently went through, I fully agree with this approach and I would do it for GCC 8. There are PRs where order of simple 2 loops is changed, causing wrong-code as there's a data dependence. Moreover, I know that Bin was thinking about selection whether to use classical loop optimizations or Graphite (depending on options provided). This would simplify it ;) Martin
Re: RFC: Improving GCC8 default option settings
> On 14 Sep 2017, at 3:06 AM, Allan Sandfeld Jensen wrote: > > On Dienstag, 12. September 2017 23:27:22 CEST Michael Clark wrote: >>> On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra wrote: >>> >>> Hi all, >>> >>> At the GNU Cauldron I was inspired by several interesting talks about >>> improving GCC in various ways. While GCC has many great optimizations, a >>> common theme is that its default settings are rather conservative. As a >>> result users are required to enable several additional optimizations by >>> hand to get good code. Other compilers enable more optimizations at -O2 >>> (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should >>> do as well. >> >> There are some nuances to -O2. Please consider -O2 users who wish use it >> like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). >> >> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase >> code size can be skipped from -Os without drastically effecting >> performance. >> >> This is not the case with GCC where -Os is a size at all costs optimisation >> mode. GCC users option for size not at the expense of speed is to use -O2. >> >> ClangGCC >> -Oz ~= -Os >> -Os ~= -O2 >> > No. Clang's -Os is somewhat limited compared to gcc's, just like the clang > -Og > is just -O1. AFAIK -Oz is a proprietary Apple clang parameter, and not in > clang proper. It appears to be in mainline clang. mclark@anarch128:~$ clang -Oz -c a.c -o a.o mclark@anarch128:~$ clang -Ox -c a.c -o a.o error: invalid integral value 'x' in '-Ox' error: invalid integral value 'x' in '-Ox' mclark@anarch128:~$ uname -a Linux anarch128.org 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) x86_64 GNU/Linux mclark@anarch128:~$ clang --version clang version 3.8.1-24 (tags/RELEASE_381/final) Target: x86_64-pc-linux-gnu Thread model: posix InstalledDir: /usr/bin I still think it would be unfortunate to loose the size/speed sweet spot of -O2 by adding optimisations that increase code size, unless there was a size optimisation option that was derived from -O2 at the point -O2 is souped up. i.e. create an -O2s (or renaming -Os to -Oz and deriving the new -Os from the current -O2). I’m going to start looking at this point to see whats involved in making a patch. Distros want a balance or size and speed might even pick it up, even if it is not accepted in mainline.
Re: RFC: Improving GCC8 default option settings
On 2017.09.14 at 11:57 +0200, Richard Biener wrote: > On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: > > On 12/09/17 16:57, Wilco Dijkstra wrote: > >> > >> [...] As a result users are > >> required to enable several additional optimizations by hand to get good > >> code. > >> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM > >> was > >> mentioned repeatedly) which GCC could/should do as well. > >> [...] > >> > >> I'd welcome discussion and other proposals for similar improvements. > > > > > > What's the status of graphite? It's been around for years. Isn't it mature > > enough to enable these: > > > > -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block > > > > by default for -O2? (And I'm not even sure those are the complete set of > > graphite optimization flags, or just the "useful" ones.) > > It's not on by default at any optimization level. The main issue is the > lack of maintainance and a set of known common internal compiler errors > we hit. The other issue is that there's no benefit of turning those on for > SPEC CPU benchmarking as far as I remember but quite a bit of extra > compile-time cost. Not to mention the numerous wrong-code bugs. IMHO graphite should deprecated as soon as possible. -- Markus
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras wrote: > On 12/09/17 16:57, Wilco Dijkstra wrote: >> >> [...] As a result users are >> required to enable several additional optimizations by hand to get good >> code. >> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM >> was >> mentioned repeatedly) which GCC could/should do as well. >> [...] >> >> I'd welcome discussion and other proposals for similar improvements. > > > What's the status of graphite? It's been around for years. Isn't it mature > enough to enable these: > > -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block > > by default for -O2? (And I'm not even sure those are the complete set of > graphite optimization flags, or just the "useful" ones.) It's not on by default at any optimization level. The main issue is the lack of maintainance and a set of known common internal compiler errors we hit. The other issue is that there's no benefit of turning those on for SPEC CPU benchmarking as far as I remember but quite a bit of extra compile-time cost. Richard.
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 5:08 PM, Allan Sandfeld Jensen wrote: > On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote: >> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: >> > On its own -O3 doesn't add much (some loop opts and slightly more >> > aggressive inlining/unrolling), so whatever it does we >> > should consider doing at -O2 eventually. >> >> Well, -O3 adds vectorization, which we don't enable at -O2 by default. >> > Would it be possible to enable basic block vectorization on -O2? I assume that > doesn't increase binary size since it doesn't unroll loops. Somebody needs to provide benchmarking looking at the compile-time cost vs. the runtime benefit and the code size effect. There's also room to tune aggressiveness of BB vectorization as it currently allows for cases where the scalar computation is not fully replaced by vector code. Richard. > 'Allan >
Re: RFC: Improving GCC8 default option settings
On September 13, 2017 6:24:21 PM GMT+02:00, Jan Hubicka wrote: >> >I don't see static profile prediction to be very useful here to find >> >"really >> >hot code" - neither in current implementation or future. The problem >of >> >-O2 is that we kind of know that only 10% of code somewhere matters >for >> >performance but we have no way to reliably identify it. >> >> It's hard to do better than statically look at (ipa) loop depth. But >shouldn't that be good enough? > >Only if you assume that you have whole program and understand indirect >calls. >There are some stats on this here >http://ieeexplore.ieee.org/document/717399/ > >It shows that propagating static profile across whole progrma (which is >just >tiny bit more fancy than counting loop depth) sort of work >statistically. I >really do not have very high hopes of this reliably working in >production >compiler. We already have PRs for single function benchmark where deep >loop >nest is used ininitialization or so and the actual hard working part >has small >loop nest & gets identified as cold. > >As soon as you start propagating in whole program context, such local >mistakes >will become more comon. Heh, I would just make loop nests hot without globally making anything cold because of that. Basically sth like optimistic ipa profile propagation. Richard. >> >> > >> >I would make sense to have less agressive vectoriazaoitn at -O2 and >> >more at >> >-Ofast/-O3. >> >> We tried that but the runtime effects were not offsetting the compile >time cost. > >Yep, i remember that. > >Honza
Re: RFC: Improving GCC8 default option settings
> >I don't see static profile prediction to be very useful here to find > >"really > >hot code" - neither in current implementation or future. The problem of > >-O2 is that we kind of know that only 10% of code somewhere matters for > >performance but we have no way to reliably identify it. > > It's hard to do better than statically look at (ipa) loop depth. But > shouldn't that be good enough? Only if you assume that you have whole program and understand indirect calls. There are some stats on this here http://ieeexplore.ieee.org/document/717399/ It shows that propagating static profile across whole progrma (which is just tiny bit more fancy than counting loop depth) sort of work statistically. I really do not have very high hopes of this reliably working in production compiler. We already have PRs for single function benchmark where deep loop nest is used ininitialization or so and the actual hard working part has small loop nest & gets identified as cold. As soon as you start propagating in whole program context, such local mistakes will become more comon. > > > > >I would make sense to have less agressive vectoriazaoitn at -O2 and > >more at > >-Ofast/-O3. > > We tried that but the runtime effects were not offsetting the compile time > cost. Yep, i remember that. Honza
Re: RFC: Improving GCC8 default option settings
On September 13, 2017 5:35:11 PM GMT+02:00, Jan Hubicka wrote: >> On Wed, Sep 13, 2017 at 3:46 PM, Jakub Jelinek >wrote: >> > On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: >> >> On its own -O3 doesn't add much (some loop opts and slightly more >> >> aggressive inlining/unrolling), so whatever it does we >> >> should consider doing at -O2 eventually. >> > >> > Well, -O3 adds vectorization, which we don't enable at -O2 by >default. >> >> As said, -fprofile-use enables it so -O2 should eventually do the >same >> for "really hot code". > >I don't see static profile prediction to be very useful here to find >"really >hot code" - neither in current implementation or future. The problem of >-O2 is that we kind of know that only 10% of code somewhere matters for >performance but we have no way to reliably identify it. It's hard to do better than statically look at (ipa) loop depth. But shouldn't that be good enough? > >I would make sense to have less agressive vectoriazaoitn at -O2 and >more at >-Ofast/-O3. We tried that but the runtime effects were not offsetting the compile time cost. >Adding -Os and -Oz would make sense to me - even with hot/cold info it >is not >desriable to optimize as agressively for size as we do becuase mistakes >happen >and one do not want to make code paths 1000 times slower to save one >byte >of binary. > >We could handle this gratefully internally by having logic for "known >to be cold" >and "guessed to be cold". New profile code can make difference in this. > >Honza >> >> Richard. >> >> > Jakub
Re: RFC: Improving GCC8 default option settings
On 12/09/17 16:57, Wilco Dijkstra wrote: [...] As a result users are required to enable several additional optimizations by hand to get good code. Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should do as well. [...] I'd welcome discussion and other proposals for similar improvements. What's the status of graphite? It's been around for years. Isn't it mature enough to enable these: -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block by default for -O2? (And I'm not even sure those are the complete set of graphite optimization flags, or just the "useful" ones.)
Re: RFC: Improving GCC8 default option settings
> On Wed, Sep 13, 2017 at 3:21 AM, Michael Clark wrote: > > > >> On 13 Sep 2017, at 1:15 PM, Michael Clark wrote: > >> > >> - https://rv8.io/bench#optimisation > >> - https://rv8.io/bench#executable-file-sizes > >> > >> -O2 is 98% perf of -O3 on x86-64 > >> -Os is 81% perf of -O3 on x86-64 > >> > >> -O2 saves 5% space on -O3 on x86-64 > >> -Os saves 8% space on -Os on x86-64 > >> > >> 17% drop in performance for 3% saving in space is not a good trade for a > >> “general” size optimisation. It’s more like executable compression. > > > > Sorry fixed typo: > > > > -O2 is 98% perf of -O3 on x86-64 > > -Os is 81% perf of -O3 on x86-64 > > > > -O2 saves 5% space on -O3 on x86-64 > > -Os saves 8% space on -O3 on x86-64 I am bit surprised you see only 8% of code size difference for -Os and -O3. I look into these numbers occasionally and it is usualy well over two digit number. http://hubicka.blogspot.cz/2014/04/linktime-optimization-in-gcc-2-firefox.html http://hubicka.blogspot.cz/2014/09/linktime-optimization-in-gcc-part-3.html has 42% code segment size reduction for Firefox and 19% for libreoffice Honza
Re: RFC: Improving GCC8 default option settings
> On Wed, Sep 13, 2017 at 3:46 PM, Jakub Jelinek wrote: > > On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: > >> On its own -O3 doesn't add much (some loop opts and slightly more > >> aggressive inlining/unrolling), so whatever it does we > >> should consider doing at -O2 eventually. > > > > Well, -O3 adds vectorization, which we don't enable at -O2 by default. > > As said, -fprofile-use enables it so -O2 should eventually do the same > for "really hot code". I don't see static profile prediction to be very useful here to find "really hot code" - neither in current implementation or future. The problem of -O2 is that we kind of know that only 10% of code somewhere matters for performance but we have no way to reliably identify it. I would make sense to have less agressive vectoriazaoitn at -O2 and more at -Ofast/-O3. Adding -Os and -Oz would make sense to me - even with hot/cold info it is not desriable to optimize as agressively for size as we do becuase mistakes happen and one do not want to make code paths 1000 times slower to save one byte of binary. We could handle this gratefully internally by having logic for "known to be cold" and "guessed to be cold". New profile code can make difference in this. Honza > > Richard. > > > Jakub
Re: RFC: Improving GCC8 default option settings
On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote: > On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: > > On its own -O3 doesn't add much (some loop opts and slightly more > > aggressive inlining/unrolling), so whatever it does we > > should consider doing at -O2 eventually. > > Well, -O3 adds vectorization, which we don't enable at -O2 by default. > Would it be possible to enable basic block vectorization on -O2? I assume that doesn't increase binary size since it doesn't unroll loops. 'Allan
Re: RFC: Improving GCC8 default option settings
On Dienstag, 12. September 2017 23:27:22 CEST Michael Clark wrote: > > On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra wrote: > > > > Hi all, > > > > At the GNU Cauldron I was inspired by several interesting talks about > > improving GCC in various ways. While GCC has many great optimizations, a > > common theme is that its default settings are rather conservative. As a > > result users are required to enable several additional optimizations by > > hand to get good code. Other compilers enable more optimizations at -O2 > > (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should > > do as well. > > There are some nuances to -O2. Please consider -O2 users who wish use it > like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). > > Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase > code size can be skipped from -Os without drastically effecting > performance. > > This is not the case with GCC where -Os is a size at all costs optimisation > mode. GCC users option for size not at the expense of speed is to use -O2. > > Clang GCC > -Oz ~= -Os > -Os ~= -O2 > No. Clang's -Os is somewhat limited compared to gcc's, just like the clang -Og is just -O1. AFAIK -Oz is a proprietary Apple clang parameter, and not in clang proper. 'Allan
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 3:46 PM, Jakub Jelinek wrote: > On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: >> On its own -O3 doesn't add much (some loop opts and slightly more >> aggressive inlining/unrolling), so whatever it does we >> should consider doing at -O2 eventually. > > Well, -O3 adds vectorization, which we don't enable at -O2 by default. As said, -fprofile-use enables it so -O2 should eventually do the same for "really hot code". Richard. > Jakub
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote: > On its own -O3 doesn't add much (some loop opts and slightly more > aggressive inlining/unrolling), so whatever it does we > should consider doing at -O2 eventually. Well, -O3 adds vectorization, which we don't enable at -O2 by default. Jakub
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 9:43 AM, Janne Blomqvist wrote: > On Tue, Sep 12, 2017 at 4:57 PM, Wilco Dijkstra > wrote: >> These are just a few ideas to start. What do people think? I'd welcome >> discussion >> and other proposals for similar improvements. > > What about the default behavior if no options are given? I think a > more reasonable default would be something roughly like > > -O2 -Wall > > or if debuggability is considered more important that speed & size, maybe > > -Og -g -Wall Enabling (some) warnings by default seems reasonable to me. Not sure about the rest though. This is something people can't seem to agree on. Some like warnings by default, some like optimizations by default. Some are against warnings by default, arguing that people like distro-builders have no need for warnings (for example). Some are against optimizations by default, because it would make compilation slower when they just want to check if some piece of code compiles successfully or not (for example). I think the only way to decide what options to enable by default is to first decide who your target audience is going to be. Who do you expect to run gcc with no options specified? If you ask me, that will mostly be students or beginners. Those who are experienced are more likely to use an automated build system where they specify build options only once and then forget about it. An unexperienced person would need warnings and debug info by default, and maybe a few simple optimizations that do not interfere with debugging. There can only be one set of default options, and which one you pick will depend on who your target audience is. You cannot please everyone. -- Kevin
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 3:21 AM, Michael Clark wrote: > >> On 13 Sep 2017, at 1:15 PM, Michael Clark wrote: >> >> - https://rv8.io/bench#optimisation >> - https://rv8.io/bench#executable-file-sizes >> >> -O2 is 98% perf of -O3 on x86-64 >> -Os is 81% perf of -O3 on x86-64 >> >> -O2 saves 5% space on -O3 on x86-64 >> -Os saves 8% space on -Os on x86-64 >> >> 17% drop in performance for 3% saving in space is not a good trade for a >> “general” size optimisation. It’s more like executable compression. > > Sorry fixed typo: > > -O2 is 98% perf of -O3 on x86-64 > -Os is 81% perf of -O3 on x86-64 > > -O2 saves 5% space on -O3 on x86-64 > -Os saves 8% space on -O3 on x86-64 > > The extra ~3% space saving for ~17% drop in performance doesn’t seem like a > good general option for size based on the cost in performance. > > Again. I really like GCC’s -O2 and hope that its binaries don’t grow in size > nor slow down. I think with GCC -Os and -O2 are essentially the same with the difference that -Os assumes regions are cold and thus to be optimized for size and -O2 assumes they are hot and thus to be optimized for speed in cases there is not heuristic proving otherwise. I know this doesn't 100% reflect implementation reality but it should be close. IMHO we should turn on flags we turn on with -fprofile-use and have some more nuances in optimize_*_for_{speed,size} as we now track profile quality more closely. I see -O1 as mostly worthless unless you are compiling machine-generated code that makes -O2+ go OOM/time. Apart from avoiding quadratic or worse algorithms -O1 sees no love. On its own -O3 doesn't add much (some loop opts and slightly more aggressive inlining/unrolling), so whatever it does we should consider doing at -O2 eventually. Richard.
Re: RFC: Improving GCC8 default option settings
On Tue, Sep 12, 2017 at 4:57 PM, Wilco Dijkstra wrote: > Hi all, > > At the GNU Cauldron I was inspired by several interesting talks about > improving > GCC in various ways. While GCC has many great optimizations, a common theme is > that its default settings are rather conservative. As a result users are > required to enable several additional optimizations by hand to get good code. > Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was > mentioned repeatedly) which GCC could/should do as well. > > Here are a few concrete proposals to improve GCC's option settings which will > enable better code generation for most targets: > > * Make -fno-math-errno the default - this mostly affects the code generated > for > sqrt, which should be treated just like floating point division and not set > errno by default (unless you explicitly select C89 mode). +1. Math functions setting errno is a blast from the past that needs to die. That being said, this does to some extent depend on libm so perhaps the default needs to be target-dependent. > * Make -fno-trapping-math the default - another obvious one. From the docs: > "Compile code assuming that floating-point operations cannot generate >user-visible traps." > There isn't a lot of code that actually uses user-visible traps (if any - > many CPUs don't even support user traps as it's an optional IEEE feature). > So assuming trapping math by default is way too conservative since there is > no obvious benefit to users. As Mr. Myers explains, this is probably going a bit too far. I think by default whatever fp optimizations are allowed with FENV_ACCESS off is reasonable. > * Make -fomit-frame-pointer the default - various targets already do this at > higher optimization levels, but this could easily be done for all targets. > Frame pointers haven't been needed for debugging for decades, however if > there > are still good reasons to keep it enabled with -O0 or -O1 (I can't think of > any > unless it is for last-resort backtrace when there is no unwind info at a > crash), > we could just disable the frame pointer from -O2 onwards. Sounds reasonable. > These are just a few ideas to start. What do people think? I'd welcome > discussion > and other proposals for similar improvements. What about the default behavior if no options are given? I think a more reasonable default would be something roughly like -O2 -Wall or if debuggability is considered more important that speed & size, maybe -Og -g -Wall -- Janne Blomqvist
Re: RFC: Improving GCC8 default option settings
> * Make -fomit-frame-pointer the default - various targets already do this at > higher optimization levels, but this could easily be done for all targets. > Frame pointers haven't been needed for debugging for decades, however if > there > are still good reasons to keep it enabled with -O0 or -O1 (I can't think of > any > unless it is for last-resort backtrace when there is no unwind info at a > crash), > we could just disable the frame pointer from -O2 onwards. Given there's an -Og now, maybe frame pointers could be enabled fo -O0 and -Og, off by default otherwise. I like to use -O1 to kick-in the analysis engine and start catching warnings. It seems like -O1 should be closer -O2/-O3, with respect to frame pointers since it could help find issues and tickle problems with hand crafted ASM. Jeff
Re: RFC: Improving GCC8 default option settings
> On 13 Sep 2017, at 1:15 PM, Michael Clark wrote: > > - https://rv8.io/bench#optimisation > - https://rv8.io/bench#executable-file-sizes > > -O2 is 98% perf of -O3 on x86-64 > -Os is 81% perf of -O3 on x86-64 > > -O2 saves 5% space on -O3 on x86-64 > -Os saves 8% space on -Os on x86-64 > > 17% drop in performance for 3% saving in space is not a good trade for a > “general” size optimisation. It’s more like executable compression. Sorry fixed typo: -O2 is 98% perf of -O3 on x86-64 -Os is 81% perf of -O3 on x86-64 -O2 saves 5% space on -O3 on x86-64 -Os saves 8% space on -O3 on x86-64 The extra ~3% space saving for ~17% drop in performance doesn’t seem like a good general option for size based on the cost in performance. Again. I really like GCC’s -O2 and hope that its binaries don’t grow in size nor slow down.
Re: RFC: Improving GCC8 default option settings
> On 13 Sep 2017, at 12:47 PM, Segher Boessenkool > wrote: > > On Wed, Sep 13, 2017 at 09:27:22AM +1200, Michael Clark wrote: >>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was >>> mentioned repeatedly) which GCC could/should do as well. >> >> There are some nuances to -O2. Please consider -O2 users who wish use it >> like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). >> >> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase >> code size can be skipped from -Os without drastically effecting performance. >> >> This is not the case with GCC where -Os is a size at all costs optimisation >> mode. GCC users option for size not at the expense of speed is to use -O2. > > "Size not at the expense of speed" exists in neither compiler. Just the > tradeoffs are different between GCC and LLVM. It would be a silly > optimisation target -- it's exactly the same as just "speed"! Unless > “speed" means "let's make it faster, and bigger just because" ;-) I would like to be able to quantify stats on a well known benchmark suite, say SPECint 2006 or SPECint 2017 but in my own small benchmark suite I saw a disproportionate difference in size between -O2 and -Os, but a significant drop in performance with -O2 vs -Os. - https://rv8.io/bench#optimisation - https://rv8.io/bench#executable-file-sizes -O2 is 98% perf of -O3 on x86-64 -Os is 81% perf of -O3 on x86-64 -O2 saves 5% space on -O3 on x86-64 -Os saves 8% space on -Os on x86-64 17% drop in performance for 3% saving in space is not a good trade for a “general” size optimisation. It’s more like executable compression. -O2 seems to be a suite spot for size versus speed. I could only recommend GCC’s -Os if the user is trying to squeeze something down to fit the last few bytes of a ROM and -Oz seems like a more appropriate name. -O2 the current suite spot in GCC and is likely closest in semantics to LLVM/Clang -Os and I’d like -O2 binaries to stay lean. I don’t think O2 should slow down nor should the binariesget larger. Turning up knobs that effect code size should be reserved for -O3 until GCC makes a distinction between -O2/-O2s and -Os/-Oz. On RISC-V I believe we could shrink binaries at -O2 further with no sacrifice in performance, perhaps with a performance improvement by reducing icache bandwidth… BTW -O2 gets great compression and performance improvements compared to -O0 ;-D it’s the points after -O2 where the trade offs don’t correlate. I like -O2 My 2c. > GCC's -Os is not "size at all costs" either; there are many options (mostly > --params) that can decrease code size significantly. To tune code size > down for your particular program you have to play with options a bit. This > shouldn't be news to anyone. > > '-Os' > Optimize for size. '-Os' enables all '-O2' optimizations that do > not typically increase code size. It also performs further > optimizations designed to reduce code size. > >> So if adding optimisations to -O2 that increase code size, please >> considering adding an -O2s that maintains the compact code size of -O2. -O2 >> generates pretty compact code as many performance optimisations tend to >> reduce code size, or otherwise add optimisations that increase code size to >> -O3. Adding loop unrolling on makes sense in the Clang/LLVM context where >> they have a compact code model with good performance i.e. -Os. In GCC this >> is -O2. >> >> So if you want to enable more optimisations at -O2, please copy -O2 >> optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation >> defaults to a new -Os. > > '-O2' > Optimize even more. GCC performs nearly all supported > optimizations that do not involve a space-speed tradeoff. As > compared to '-O', this option increases both compilation time and > the performance of the generated code. > >> The present reality is that any project that wishes to optimize for size at >> all costs will need to run a configure test for -Oz, and then fall back to >> -Os, given the current disparity between Clang/LLVM and GCC flags here. > > The present reality is that any project that wishes to support both GCC and > LLVM needs to do configure tests, because LLVM chose to do many things > differently (sometimes unavoidably). If GCC would change some options > to be more like LLVM, all users only ever using GCC would be affected, > while all other incompatibilities would remain. Not a good tradeoff at > all. > > > Segher
Re: RFC: Improving GCC8 default option settings
On Wed, Sep 13, 2017 at 09:27:22AM +1200, Michael Clark wrote: > > Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was > > mentioned repeatedly) which GCC could/should do as well. > > There are some nuances to -O2. Please consider -O2 users who wish use it like > Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). > > Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase > code size can be skipped from -Os without drastically effecting performance. > > This is not the case with GCC where -Os is a size at all costs optimisation > mode. GCC users option for size not at the expense of speed is to use -O2. "Size not at the expense of speed" exists in neither compiler. Just the tradeoffs are different between GCC and LLVM. It would be a silly optimisation target -- it's exactly the same as just "speed"! Unless "speed" means "let's make it faster, and bigger just because" ;-) GCC's -Os is not "size at all costs" either; there are many options (mostly --params) that can decrease code size significantly. To tune code size down for your particular program you have to play with options a bit. This shouldn't be news to anyone. '-Os' Optimize for size. '-Os' enables all '-O2' optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size. > So if adding optimisations to -O2 that increase code size, please considering > adding an -O2s that maintains the compact code size of -O2. -O2 generates > pretty compact code as many performance optimisations tend to reduce code > size, or otherwise add optimisations that increase code size to -O3. Adding > loop unrolling on makes sense in the Clang/LLVM context where they have a > compact code model with good performance i.e. -Os. In GCC this is -O2. > > So if you want to enable more optimisations at -O2, please copy -O2 > optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation defaults > to a new -Os. '-O2' Optimize even more. GCC performs nearly all supported optimizations that do not involve a space-speed tradeoff. As compared to '-O', this option increases both compilation time and the performance of the generated code. > The present reality is that any project that wishes to optimize for size at > all costs will need to run a configure test for -Oz, and then fall back to > -Os, given the current disparity between Clang/LLVM and GCC flags here. The present reality is that any project that wishes to support both GCC and LLVM needs to do configure tests, because LLVM chose to do many things differently (sometimes unavoidably). If GCC would change some options to be more like LLVM, all users only ever using GCC would be affected, while all other incompatibilities would remain. Not a good tradeoff at all. Segher
Re: RFC: Improving GCC8 default option settings
> On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra wrote: > > Hi all, > > At the GNU Cauldron I was inspired by several interesting talks about > improving > GCC in various ways. While GCC has many great optimizations, a common theme is > that its default settings are rather conservative. As a result users are > required to enable several additional optimizations by hand to get good code. > Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was > mentioned repeatedly) which GCC could/should do as well. There are some nuances to -O2. Please consider -O2 users who wish use it like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC). Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase code size can be skipped from -Os without drastically effecting performance. This is not the case with GCC where -Os is a size at all costs optimisation mode. GCC users option for size not at the expense of speed is to use -O2. Clang GCC -Oz ~= -Os -Os ~= -O2 So if adding optimisations to -O2 that increase code size, please considering adding an -O2s that maintains the compact code size of -O2. -O2 generates pretty compact code as many performance optimisations tend to reduce code size, or otherwise add optimisations that increase code size to -O3. Adding loop unrolling on makes sense in the Clang/LLVM context where they have a compact code model with good performance i.e. -Os. In GCC this is -O2. So if you want to enable more optimisations at -O2, please copy -O2 optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation defaults to a new -Os. The present reality is that any project that wishes to optimize for size at all costs will need to run a configure test for -Oz, and then fall back to -Os, given the current disparity between Clang/LLVM and GCC flags here. > Here are a few concrete proposals to improve GCC's option settings which will > enable better code generation for most targets: > > * Make -fno-math-errno the default - this mostly affects the code generated > for > sqrt, which should be treated just like floating point division and not set > errno by default (unless you explicitly select C89 mode). > > * Make -fno-trapping-math the default - another obvious one. From the docs: > "Compile code assuming that floating-point operations cannot generate > user-visible traps." > There isn't a lot of code that actually uses user-visible traps (if any - > many CPUs don't even support user traps as it's an optional IEEE feature). > So assuming trapping math by default is way too conservative since there is > no obvious benefit to users. > > * Make -fno-common the default - this was originally needed for pre-ANSI C, > but > is optional in C (not sure whether it is still in C99/C11). This can > significantly improve code generation on targets that use anchors for globals > (note the linker could report a more helpful message when ancient code that > requires -fcommon fails to link). > > * Make -fomit-frame-pointer the default - various targets already do this at > higher optimization levels, but this could easily be done for all targets. > Frame pointers haven't been needed for debugging for decades, however if > there > are still good reasons to keep it enabled with -O0 or -O1 (I can't think of > any > unless it is for last-resort backtrace when there is no unwind info at a > crash), > we could just disable the frame pointer from -O2 onwards. > > These are just a few ideas to start. What do people think? I'd welcome > discussion > and other proposals for similar improvements. > > Wilco
Re: RFC: Improving GCC8 default option settings
On Tue, 12 Sep 2017, Alexander Monakov wrote: > > * Make -fno-trapping-math the default - another obvious one. From the docs: > > "Compile code assuming that floating-point operations cannot generate > >user-visible traps." > > There isn't a lot of code that actually uses user-visible traps (if any - > > many CPUs don't even support user traps as it's an optional IEEE > > feature). > > So assuming trapping math by default is way too conservative since there > > is > > no obvious benefit to users. > > OTOH -O options are understood to _never_ sacrifice standards compliance, with > the exception of -Ofast. I believe that's an important property to keep. ISO C allows the FENV_ACCESS pragma to be OFF by default (we don't support the standard pragmas, but FENV_ACCESS ON is equivalent to a stricter version of -frounding-math -ftrapping-math). Thus, this is not a standards compliance issue (unlike various parts of -ffast-math that break IEEE 754 semantics even with FENV_ACCESS OFF). And since -fno-rounding-math is the default, the default is already a form of FENV_ACCESS OFF. I don't think any -O implication of -fno-trapping-math was proposed; the proposal was about the default (independent of -O options). -- Joseph S. Myers jos...@codesourcery.com
Re: RFC: Improving GCC8 default option settings
On Tue, 12 Sep 2017, Wilco Dijkstra wrote: > * Make -fno-math-errno the default - this mostly affects the code generated > for > sqrt, which should be treated just like floating point division and not set > errno by default (unless you explicitly select C89 mode). (note that this can be selectively enabled by targets where libm never sets errno in the first place, docs call out Darwin as one such target, but musl-libc targets have this property too) > * Make -fno-trapping-math the default - another obvious one. From the docs: > "Compile code assuming that floating-point operations cannot generate >user-visible traps." > There isn't a lot of code that actually uses user-visible traps (if any - > many CPUs don't even support user traps as it's an optional IEEE feature). > So assuming trapping math by default is way too conservative since there is > no obvious benefit to users. OTOH -O options are understood to _never_ sacrifice standards compliance, with the exception of -Ofast. I believe that's an important property to keep. Maybe it's possible to treat -fno-trapping-math similar to -ffp-contract=fast, i.e. implicitly enable it in the default C-with-GNU-extensions mode, keeping strict-compliance mode (-std=c11 as opposed to gnu11) untouched? In any case it shouldn't be hard to issue a warning if fenv.h functions are used when -fno-trapping-math/-fno-rounding-math is enabled. If the above doesn't fly, I believe adopting and promoting a single option for non-value-changing math optimizations (-fno-math-errno -fno-trapping-math, plus -fno-rounding-math -fno-signaling-nans when they're no longer default) would be nice. > * Make -fno-common the default - this was originally needed for pre-ANSI C, > but > is optional in C (not sure whether it is still in C99/C11). This can > significantly improve code generation on targets that use anchors for > globals > (note the linker could report a more helpful message when ancient code that > requires -fcommon fails to link). I think in ISO C situations where -fcommon allows link to succeed fall under undefined behavior, which in GNU toolchain is defined to match the historical behavior. I assume the main issue with this is the amount of legacy code that would cause a link failure if -fno-common is made default - thus, is there anybody in position to trigger a full-distro rebuild with gcc patched to enable -fno-common, and compare before/after build failure stats? Thanks. Alexander
Re: RFC: Improving GCC8 default option settings
On Tue, 12 Sep 2017, Wilco Dijkstra wrote: > * Make -fno-math-errno the default - this mostly affects the code generated > for > sqrt, which should be treated just like floating point division and not set > errno by default (unless you explicitly select C89 mode). > > * Make -fno-trapping-math the default - another obvious one. From the docs: Note these would both have implications for library math_errhandling settings (since the compiler options can affect built-in functions). In the absence of -ffast-math glibc defines it to (MATH_ERRNO | MATH_ERREXCEPT). __NO_MATH_ERRNO__ exists since GCC 5 to allow it to be defined to just MATH_ERREXCEPT in the -fno-math-errno case, but the header needs updating to respect that. And we don't have a macro for -fno-trapping-math to say whether MATH_ERREXCEPT should be part of the value. My assumption is that with changed defaults glibc would need to compile with -ftrapping-math just as it uses -frounding-math; code may expect transformations that add exceptions not to occur. It probably does not require -fmath-errno (glibc functions do not generally rely on other functions setting errno). -- Joseph S. Myers jos...@codesourcery.com
Re: RFC: Improving GCC8 default option settings
On 09/12/2017 05:32 PM, Andrew Pinski wrote: > .On Tue, Sep 12, 2017 at 8:29 AM, Theodore Papadopoulo > wrote: >> Another one that might be interesting is -funsafe-loop-optimizations. >> In most cases people write loops assuming simple finite loops (no >> overflow). Crippling optimization for the small amount of people (system >> programmers ?) that use such strange loops seems counterproductive. It >> would be best if such loops can be marked with an attribute in some way >> and that the general case just assumes that all loops are finite... > > -funsafe-loop-optimizations is a nop in GCC 7 and above. > Since https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00956.html . > > Thanks, > Andrew > Thank's for the notice. For some reason, I missed that piece of information... Too bad that making such an assumption generates bogus code in some common cases. Theo. 0x4F273D5D.asc Description: application/pgp-keys
Re: RFC: Improving GCC8 default option settings
.On Tue, Sep 12, 2017 at 8:29 AM, Theodore Papadopoulo wrote: > Another one that might be interesting is -funsafe-loop-optimizations. > In most cases people write loops assuming simple finite loops (no > overflow). Crippling optimization for the small amount of people (system > programmers ?) that use such strange loops seems counterproductive. It > would be best if such loops can be marked with an attribute in some way > and that the general case just assumes that all loops are finite... -funsafe-loop-optimizations is a nop in GCC 7 and above. Since https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00956.html . Thanks, Andrew
Re: RFC: Improving GCC8 default option settings
On Tue, 12 Sep 2017, Wilco Dijkstra wrote: > * Make -fno-trapping-math the default - another obvious one. From the docs: > "Compile code assuming that floating-point operations cannot generate >user-visible traps." > There isn't a lot of code that actually uses user-visible traps (if any - > many CPUs don't even support user traps as it's an optional IEEE feature). > So assuming trapping math by default is way too conservative since there is > no obvious benefit to users. "traps" here means "raising IEEE exception flags" not just "invoking trap handlers". That is, -ftrapping-math disables a range of local transformations that would change the set of flags raised by an operation. (Transformations that change the nonzero number of times a flag is raised to a different nonzero number are always OK; that is, the possibility of a trap handler counting how many times it is invoked is never considered. Transformations that might move flag raising across function calls or asms that might inspect or modify the flags should not be OK, at least with a stricter version of -ftrapping-math that might be another option, but we don't have that stricter version at present; -ftrapping-math generally does not disable code movement, or removal of code that is dead apart from its effect on exception flags.) That is, lack of trap support on processors that only support exception flags is not relevant to -ftrapping-math, beyond any question of whether -ftrapping-math should disable transformations that only affect whether an exact underflow exception occurs (the case where default exception handling does not raise the flag), if we have any such transformations (constant folding on exact underflow?). It's true that a stricter version of -ftrapping-math that inhibits code movement and removal would probably inhibit *more* optimizations than -frounding-math (which is off by default), as -frounding-math only makes floating-point operations read thread-local state but -ftrapping-math makes them write it as well. -- Joseph S. Myers jos...@codesourcery.com
Re: RFC: Improving GCC8 default option settings
Another one that might be interesting is -funsafe-loop-optimizations. In most cases people write loops assuming simple finite loops (no overflow). Crippling optimization for the small amount of people (system programmers ?) that use such strange loops seems counterproductive. It would be best if such loops can be marked with an attribute in some way and that the general case just assumes that all loops are finite... 0x4F273D5D.asc Description: application/pgp-keys
RFC: Improving GCC8 default option settings
Hi all, At the GNU Cauldron I was inspired by several interesting talks about improving GCC in various ways. While GCC has many great optimizations, a common theme is that its default settings are rather conservative. As a result users are required to enable several additional optimizations by hand to get good code. Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should do as well. Here are a few concrete proposals to improve GCC's option settings which will enable better code generation for most targets: * Make -fno-math-errno the default - this mostly affects the code generated for sqrt, which should be treated just like floating point division and not set errno by default (unless you explicitly select C89 mode). * Make -fno-trapping-math the default - another obvious one. From the docs: "Compile code assuming that floating-point operations cannot generate user-visible traps." There isn't a lot of code that actually uses user-visible traps (if any - many CPUs don't even support user traps as it's an optional IEEE feature). So assuming trapping math by default is way too conservative since there is no obvious benefit to users. * Make -fno-common the default - this was originally needed for pre-ANSI C, but is optional in C (not sure whether it is still in C99/C11). This can significantly improve code generation on targets that use anchors for globals (note the linker could report a more helpful message when ancient code that requires -fcommon fails to link). * Make -fomit-frame-pointer the default - various targets already do this at higher optimization levels, but this could easily be done for all targets. Frame pointers haven't been needed for debugging for decades, however if there are still good reasons to keep it enabled with -O0 or -O1 (I can't think of any unless it is for last-resort backtrace when there is no unwind info at a crash), we could just disable the frame pointer from -O2 onwards. These are just a few ideas to start. What do people think? I'd welcome discussion and other proposals for similar improvements. Wilco