Re: RFC: Improving GCC8 default option settings

2017-09-16 Thread Martin Jambor
Hi,

On Thu, Sep 14, 2017 at 11:55:21AM +0200, Richard Biener wrote:
> On Wed, Sep 13, 2017 at 5:08 PM, Allan Sandfeld Jensen
>  wrote:
> > On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote:
> >> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
> >> > On its own -O3 doesn't add much (some loop opts and slightly more
> >> > aggressive inlining/unrolling), so whatever it does we
> >> > should consider doing at -O2 eventually.
> >>
> >> Well, -O3 adds vectorization, which we don't enable at -O2 by default.
> >>
> > Would it be possible to enable basic block vectorization on -O2? I assume 
> > that
> > doesn't increase binary size since it doesn't unroll loops.
> 
> Somebody needs to provide benchmarking looking at the compile-time cost
> vs. the runtime benefit and the code size effect.  There's also room to tune
> aggressiveness of BB vectorization as it currently allows for cases where
> the scalar computation is not fully replaced by vector code.
> 

A good candidate too look at might be 525.x264_r from the SPEC2017 CPU
suite.  With just -O2, GCC is about 70% slower than LLVM (which I
think must be doing some vectorization at -O2).  When I give -O2
-ftree-vectorize to gcc, the difference drops to 20%, so vectorization
is not the whole story either.  There is no real difference in
run-time of executables generated with both compilers at -Ofast.

(But no, I'm not volunteering to analyze it further in foreseeable
future.)

Martin


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 3:08 PM, Markus Trippelsdorf
 wrote:
> On 2017.09.14 at 14:48 +0200, Richard Biener wrote:
>> On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška  wrote:
>> > On 09/14/2017 12:37 PM, Bin.Cheng wrote:
>> >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
>> >>  wrote:
>> >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
>>  On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
>> > On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
>> >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
>> >> wrote:
>> >>> On 12/09/17 16:57, Wilco Dijkstra wrote:
>> 
>>  [...] As a result users are
>>  required to enable several additional optimizations by hand to get 
>>  good
>>  code.
>>  Other compilers enable more optimizations at -O2 (loop unrolling in 
>>  LLVM
>>  was
>>  mentioned repeatedly) which GCC could/should do as well.
>>  [...]
>> 
>>  I'd welcome discussion and other proposals for similar improvements.
>> >>>
>> >>>
>> >>> What's the status of graphite? It's been around for years. Isn't it 
>> >>> mature
>> >>> enough to enable these:
>> >>>
>> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine 
>> >>> -floop-block
>> >>>
>> >>> by default for -O2? (And I'm not even sure those are the complete 
>> >>> set of
>> >>> graphite optimization flags, or just the "useful" ones.)
>> >>
>> >> It's not on by default at any optimization level.  The main issue is 
>> >> the
>> >> lack of maintainance and a set of known common internal compiler 
>> >> errors
>> >> we hit.  The other issue is that there's no benefit of turning those 
>> >> on for
>> >> SPEC CPU benchmarking as far as I remember but quite a bit of extra
>> >> compile-time cost.
>> >
>> > Not to mention the numerous wrong-code bugs. IMHO graphite should
>> > deprecated as soon as possible.
>> >
>> 
>>  For wrong-code bugs we've got and I recently went through, I fully 
>>  agree with this
>>  approach and I would do it for GCC 8. There are PRs where order of 
>>  simple 2 loops
>>  is changed, causing wrong-code as there's a data dependence.
>> 
>>  Moreover, I know that Bin was thinking about selection whether to use 
>>  classical loop
>>  optimizations or Graphite (depending on options provided). This would 
>>  simplify it ;)
>> >>>
>> >>> I don't think removing graphite is warranted, I still think it is the
>> >>> approach to use when
>> >>> handling non-perfect nests.
>> >> Hi,
>> >> IMHO, we should not be in a hurry to remove graphite, though we are
>> >> introducing some traditional transformations.  It's a quite standalone
>> >> part in GCC and supports more transformations.  Also as it gets more
>> >> attention, never know if somebody will find time to work on it.
>> >
>> > Ok. I just wanted to express that from user's perspective I would not 
>> > recommend it to use.
>> > Even if it improves some interesting (and for classical loop optimization 
>> > hard) loop nests,
>> > it can still blow up on a quite simple data dependence in between loops. 
>> > That said, it's quite
>> > risky to use it.
>>
>> We only have a single wrong-code bug in bugzilla with a testcase and I
>> just fixed it (well,
>> patch in testing).  We do have plenty of ICEs, yes.
>
> Even tramp3d-v4, which is cited in several graphite papers, gets
> miscompiled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823.

But unfortunately there isn't a self-contained testcase for that.  The comments
hint at sth like

int a[][];
p = &a[1][0];
for(;;)
  a[i][j] = ...
  p[i] = ...

would get at it, that is, accessing memory via two-dim array and pointer.

Richard.

> --
> Markus


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Markus Trippelsdorf
On 2017.09.14 at 14:48 +0200, Richard Biener wrote:
> On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška  wrote:
> > On 09/14/2017 12:37 PM, Bin.Cheng wrote:
> >> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
> >>  wrote:
> >>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
>  On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
> > On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
> >> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
> >> wrote:
> >>> On 12/09/17 16:57, Wilco Dijkstra wrote:
> 
>  [...] As a result users are
>  required to enable several additional optimizations by hand to get 
>  good
>  code.
>  Other compilers enable more optimizations at -O2 (loop unrolling in 
>  LLVM
>  was
>  mentioned repeatedly) which GCC could/should do as well.
>  [...]
> 
>  I'd welcome discussion and other proposals for similar improvements.
> >>>
> >>>
> >>> What's the status of graphite? It's been around for years. Isn't it 
> >>> mature
> >>> enough to enable these:
> >>>
> >>> -floop-interchange -ftree-loop-distribution -floop-strip-mine 
> >>> -floop-block
> >>>
> >>> by default for -O2? (And I'm not even sure those are the complete set 
> >>> of
> >>> graphite optimization flags, or just the "useful" ones.)
> >>
> >> It's not on by default at any optimization level.  The main issue is 
> >> the
> >> lack of maintainance and a set of known common internal compiler errors
> >> we hit.  The other issue is that there's no benefit of turning those 
> >> on for
> >> SPEC CPU benchmarking as far as I remember but quite a bit of extra
> >> compile-time cost.
> >
> > Not to mention the numerous wrong-code bugs. IMHO graphite should
> > deprecated as soon as possible.
> >
> 
>  For wrong-code bugs we've got and I recently went through, I fully agree 
>  with this
>  approach and I would do it for GCC 8. There are PRs where order of 
>  simple 2 loops
>  is changed, causing wrong-code as there's a data dependence.
> 
>  Moreover, I know that Bin was thinking about selection whether to use 
>  classical loop
>  optimizations or Graphite (depending on options provided). This would 
>  simplify it ;)
> >>>
> >>> I don't think removing graphite is warranted, I still think it is the
> >>> approach to use when
> >>> handling non-perfect nests.
> >> Hi,
> >> IMHO, we should not be in a hurry to remove graphite, though we are
> >> introducing some traditional transformations.  It's a quite standalone
> >> part in GCC and supports more transformations.  Also as it gets more
> >> attention, never know if somebody will find time to work on it.
> >
> > Ok. I just wanted to express that from user's perspective I would not 
> > recommend it to use.
> > Even if it improves some interesting (and for classical loop optimization 
> > hard) loop nests,
> > it can still blow up on a quite simple data dependence in between loops. 
> > That said, it's quite
> > risky to use it.
> 
> We only have a single wrong-code bug in bugzilla with a testcase and I
> just fixed it (well,
> patch in testing).  We do have plenty of ICEs, yes.

Even tramp3d-v4, which is cited in several graphite papers, gets
miscompiled: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68823.

-- 
Markus


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 12:42 PM, Martin Liška  wrote:
> On 09/14/2017 12:37 PM, Bin.Cheng wrote:
>> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
>>  wrote:
>>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
 On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
> On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
>> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
>> wrote:
>>> On 12/09/17 16:57, Wilco Dijkstra wrote:

 [...] As a result users are
 required to enable several additional optimizations by hand to get good
 code.
 Other compilers enable more optimizations at -O2 (loop unrolling in 
 LLVM
 was
 mentioned repeatedly) which GCC could/should do as well.
 [...]

 I'd welcome discussion and other proposals for similar improvements.
>>>
>>>
>>> What's the status of graphite? It's been around for years. Isn't it 
>>> mature
>>> enough to enable these:
>>>
>>> -floop-interchange -ftree-loop-distribution -floop-strip-mine 
>>> -floop-block
>>>
>>> by default for -O2? (And I'm not even sure those are the complete set of
>>> graphite optimization flags, or just the "useful" ones.)
>>
>> It's not on by default at any optimization level.  The main issue is the
>> lack of maintainance and a set of known common internal compiler errors
>> we hit.  The other issue is that there's no benefit of turning those on 
>> for
>> SPEC CPU benchmarking as far as I remember but quite a bit of extra
>> compile-time cost.
>
> Not to mention the numerous wrong-code bugs. IMHO graphite should
> deprecated as soon as possible.
>

 For wrong-code bugs we've got and I recently went through, I fully agree 
 with this
 approach and I would do it for GCC 8. There are PRs where order of simple 
 2 loops
 is changed, causing wrong-code as there's a data dependence.

 Moreover, I know that Bin was thinking about selection whether to use 
 classical loop
 optimizations or Graphite (depending on options provided). This would 
 simplify it ;)
>>>
>>> I don't think removing graphite is warranted, I still think it is the
>>> approach to use when
>>> handling non-perfect nests.
>> Hi,
>> IMHO, we should not be in a hurry to remove graphite, though we are
>> introducing some traditional transformations.  It's a quite standalone
>> part in GCC and supports more transformations.  Also as it gets more
>> attention, never know if somebody will find time to work on it.
>
> Ok. I just wanted to express that from user's perspective I would not 
> recommend it to use.
> Even if it improves some interesting (and for classical loop optimization 
> hard) loop nests,
> it can still blow up on a quite simple data dependence in between loops. That 
> said, it's quite
> risky to use it.

We only have a single wrong-code bug in bugzilla with a testcase and I
just fixed it (well,
patch in testing).  We do have plenty of ICEs, yes.

Richard.

> Thanks,
> Martin
>
>>
>> Thanks,
>> bin
>>>
>>> Richard.
>>>
 Martin
>


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Martin Liška
On 09/14/2017 12:37 PM, Bin.Cheng wrote:
> On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
>  wrote:
>> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
>>> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
 On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
> wrote:
>> On 12/09/17 16:57, Wilco Dijkstra wrote:
>>>
>>> [...] As a result users are
>>> required to enable several additional optimizations by hand to get good
>>> code.
>>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
>>> was
>>> mentioned repeatedly) which GCC could/should do as well.
>>> [...]
>>>
>>> I'd welcome discussion and other proposals for similar improvements.
>>
>>
>> What's the status of graphite? It's been around for years. Isn't it 
>> mature
>> enough to enable these:
>>
>> -floop-interchange -ftree-loop-distribution -floop-strip-mine 
>> -floop-block
>>
>> by default for -O2? (And I'm not even sure those are the complete set of
>> graphite optimization flags, or just the "useful" ones.)
>
> It's not on by default at any optimization level.  The main issue is the
> lack of maintainance and a set of known common internal compiler errors
> we hit.  The other issue is that there's no benefit of turning those on 
> for
> SPEC CPU benchmarking as far as I remember but quite a bit of extra
> compile-time cost.

 Not to mention the numerous wrong-code bugs. IMHO graphite should
 deprecated as soon as possible.

>>>
>>> For wrong-code bugs we've got and I recently went through, I fully agree 
>>> with this
>>> approach and I would do it for GCC 8. There are PRs where order of simple 2 
>>> loops
>>> is changed, causing wrong-code as there's a data dependence.
>>>
>>> Moreover, I know that Bin was thinking about selection whether to use 
>>> classical loop
>>> optimizations or Graphite (depending on options provided). This would 
>>> simplify it ;)
>>
>> I don't think removing graphite is warranted, I still think it is the
>> approach to use when
>> handling non-perfect nests.
> Hi,
> IMHO, we should not be in a hurry to remove graphite, though we are
> introducing some traditional transformations.  It's a quite standalone
> part in GCC and supports more transformations.  Also as it gets more
> attention, never know if somebody will find time to work on it.

Ok. I just wanted to express that from user's perspective I would not recommend 
it to use.
Even if it improves some interesting (and for classical loop optimization hard) 
loop nests,
it can still blow up on a quite simple data dependence in between loops. That 
said, it's quite
risky to use it.

Thanks,
Martin

> 
> Thanks,
> bin
>>
>> Richard.
>>
>>> Martin



Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Bin.Cheng
On Thu, Sep 14, 2017 at 11:24 AM, Richard Biener
 wrote:
> On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
>> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
>>> On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
 On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  
 wrote:
> On 12/09/17 16:57, Wilco Dijkstra wrote:
>>
>> [...] As a result users are
>> required to enable several additional optimizations by hand to get good
>> code.
>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
>> was
>> mentioned repeatedly) which GCC could/should do as well.
>> [...]
>>
>> I'd welcome discussion and other proposals for similar improvements.
>
>
> What's the status of graphite? It's been around for years. Isn't it mature
> enough to enable these:
>
> -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block
>
> by default for -O2? (And I'm not even sure those are the complete set of
> graphite optimization flags, or just the "useful" ones.)

 It's not on by default at any optimization level.  The main issue is the
 lack of maintainance and a set of known common internal compiler errors
 we hit.  The other issue is that there's no benefit of turning those on for
 SPEC CPU benchmarking as far as I remember but quite a bit of extra
 compile-time cost.
>>>
>>> Not to mention the numerous wrong-code bugs. IMHO graphite should
>>> deprecated as soon as possible.
>>>
>>
>> For wrong-code bugs we've got and I recently went through, I fully agree 
>> with this
>> approach and I would do it for GCC 8. There are PRs where order of simple 2 
>> loops
>> is changed, causing wrong-code as there's a data dependence.
>>
>> Moreover, I know that Bin was thinking about selection whether to use 
>> classical loop
>> optimizations or Graphite (depending on options provided). This would 
>> simplify it ;)
>
> I don't think removing graphite is warranted, I still think it is the
> approach to use when
> handling non-perfect nests.
Hi,
IMHO, we should not be in a hurry to remove graphite, though we are
introducing some traditional transformations.  It's a quite standalone
part in GCC and supports more transformations.  Also as it gets more
attention, never know if somebody will find time to work on it.

Thanks,
bin
>
> Richard.
>
>> Martin


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Thu, Sep 14, 2017 at 12:18 PM, Martin Liška  wrote:
> On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
>> On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
>>> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  wrote:
 On 12/09/17 16:57, Wilco Dijkstra wrote:
>
> [...] As a result users are
> required to enable several additional optimizations by hand to get good
> code.
> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
> was
> mentioned repeatedly) which GCC could/should do as well.
> [...]
>
> I'd welcome discussion and other proposals for similar improvements.


 What's the status of graphite? It's been around for years. Isn't it mature
 enough to enable these:

 -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block

 by default for -O2? (And I'm not even sure those are the complete set of
 graphite optimization flags, or just the "useful" ones.)
>>>
>>> It's not on by default at any optimization level.  The main issue is the
>>> lack of maintainance and a set of known common internal compiler errors
>>> we hit.  The other issue is that there's no benefit of turning those on for
>>> SPEC CPU benchmarking as far as I remember but quite a bit of extra
>>> compile-time cost.
>>
>> Not to mention the numerous wrong-code bugs. IMHO graphite should
>> deprecated as soon as possible.
>>
>
> For wrong-code bugs we've got and I recently went through, I fully agree with 
> this
> approach and I would do it for GCC 8. There are PRs where order of simple 2 
> loops
> is changed, causing wrong-code as there's a data dependence.
>
> Moreover, I know that Bin was thinking about selection whether to use 
> classical loop
> optimizations or Graphite (depending on options provided). This would 
> simplify it ;)

I don't think removing graphite is warranted, I still think it is the
approach to use when
handling non-perfect nests.

Richard.

> Martin


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Martin Liška
On 09/14/2017 12:07 PM, Markus Trippelsdorf wrote:
> On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
>> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  wrote:
>>> On 12/09/17 16:57, Wilco Dijkstra wrote:

 [...] As a result users are
 required to enable several additional optimizations by hand to get good
 code.
 Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
 was
 mentioned repeatedly) which GCC could/should do as well.
 [...]

 I'd welcome discussion and other proposals for similar improvements.
>>>
>>>
>>> What's the status of graphite? It's been around for years. Isn't it mature
>>> enough to enable these:
>>>
>>> -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block
>>>
>>> by default for -O2? (And I'm not even sure those are the complete set of
>>> graphite optimization flags, or just the "useful" ones.)
>>
>> It's not on by default at any optimization level.  The main issue is the
>> lack of maintainance and a set of known common internal compiler errors
>> we hit.  The other issue is that there's no benefit of turning those on for
>> SPEC CPU benchmarking as far as I remember but quite a bit of extra
>> compile-time cost.
> 
> Not to mention the numerous wrong-code bugs. IMHO graphite should
> deprecated as soon as possible.
> 

For wrong-code bugs we've got and I recently went through, I fully agree with 
this
approach and I would do it for GCC 8. There are PRs where order of simple 2 
loops
is changed, causing wrong-code as there's a data dependence.

Moreover, I know that Bin was thinking about selection whether to use classical 
loop
optimizations or Graphite (depending on options provided). This would simplify 
it ;)

Martin


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Michael Clark

> On 14 Sep 2017, at 3:06 AM, Allan Sandfeld Jensen  wrote:
> 
> On Dienstag, 12. September 2017 23:27:22 CEST Michael Clark wrote:
>>> On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra  wrote:
>>> 
>>> Hi all,
>>> 
>>> At the GNU Cauldron I was inspired by several interesting talks about
>>> improving GCC in various ways. While GCC has many great optimizations, a
>>> common theme is that its default settings are rather conservative. As a
>>> result users are required to enable several additional optimizations by
>>> hand to get good code. Other compilers enable more optimizations at -O2
>>> (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should
>>> do as well.
>> 
>> There are some nuances to -O2. Please consider -O2 users who wish use it
>> like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC).
>> 
>> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase
>> code size can be skipped from -Os without drastically effecting
>> performance.
>> 
>> This is not the case with GCC where -Os is a size at all costs optimisation
>> mode. GCC users option for size not at the expense of speed is to use -O2.
>> 
>> ClangGCC
>> -Oz  ~=  -Os
>> -Os  ~=  -O2
>> 
> No. Clang's -Os is somewhat limited compared to gcc's, just like the clang 
> -Og 
> is just -O1. AFAIK -Oz is a proprietary Apple clang parameter, and not in 
> clang proper.

It appears to be in mainline clang.

mclark@anarch128:~$ clang -Oz -c a.c -o a.o
mclark@anarch128:~$ clang -Ox -c a.c -o a.o
error: invalid integral value 'x' in '-Ox'
error: invalid integral value 'x' in '-Ox'
mclark@anarch128:~$ uname -a
Linux anarch128.org 4.9.0-3-amd64 #1 SMP Debian 4.9.30-2+deb9u3 (2017-08-06) 
x86_64 GNU/Linux
mclark@anarch128:~$ clang --version
clang version 3.8.1-24 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

I still think it would be unfortunate to loose the size/speed sweet spot of -O2 
by adding optimisations that increase code size, unless there was a size 
optimisation option that was derived from -O2 at the point -O2 is souped up. 
i.e. create an -O2s (or renaming -Os to -Oz and deriving the new -Os from the 
current -O2).

I’m going to start looking at this point to see whats involved in making a 
patch. Distros want a balance or size and speed might even pick it up, even if 
it is not accepted in mainline.

Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Markus Trippelsdorf
On 2017.09.14 at 11:57 +0200, Richard Biener wrote:
> On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  wrote:
> > On 12/09/17 16:57, Wilco Dijkstra wrote:
> >>
> >> [...] As a result users are
> >> required to enable several additional optimizations by hand to get good
> >> code.
> >> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
> >> was
> >> mentioned repeatedly) which GCC could/should do as well.
> >> [...]
> >>
> >> I'd welcome discussion and other proposals for similar improvements.
> >
> >
> > What's the status of graphite? It's been around for years. Isn't it mature
> > enough to enable these:
> >
> > -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block
> >
> > by default for -O2? (And I'm not even sure those are the complete set of
> > graphite optimization flags, or just the "useful" ones.)
> 
> It's not on by default at any optimization level.  The main issue is the
> lack of maintainance and a set of known common internal compiler errors
> we hit.  The other issue is that there's no benefit of turning those on for
> SPEC CPU benchmarking as far as I remember but quite a bit of extra
> compile-time cost.

Not to mention the numerous wrong-code bugs. IMHO graphite should
deprecated as soon as possible.

-- 
Markus


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Wed, Sep 13, 2017 at 6:11 PM, Nikos Chantziaras  wrote:
> On 12/09/17 16:57, Wilco Dijkstra wrote:
>>
>> [...] As a result users are
>> required to enable several additional optimizations by hand to get good
>> code.
>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM
>> was
>> mentioned repeatedly) which GCC could/should do as well.
>> [...]
>>
>> I'd welcome discussion and other proposals for similar improvements.
>
>
> What's the status of graphite? It's been around for years. Isn't it mature
> enough to enable these:
>
> -floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block
>
> by default for -O2? (And I'm not even sure those are the complete set of
> graphite optimization flags, or just the "useful" ones.)

It's not on by default at any optimization level.  The main issue is the
lack of maintainance and a set of known common internal compiler errors
we hit.  The other issue is that there's no benefit of turning those on for
SPEC CPU benchmarking as far as I remember but quite a bit of extra
compile-time cost.

Richard.


Re: RFC: Improving GCC8 default option settings

2017-09-14 Thread Richard Biener
On Wed, Sep 13, 2017 at 5:08 PM, Allan Sandfeld Jensen
 wrote:
> On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote:
>> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
>> > On its own -O3 doesn't add much (some loop opts and slightly more
>> > aggressive inlining/unrolling), so whatever it does we
>> > should consider doing at -O2 eventually.
>>
>> Well, -O3 adds vectorization, which we don't enable at -O2 by default.
>>
> Would it be possible to enable basic block vectorization on -O2? I assume that
> doesn't increase binary size since it doesn't unroll loops.

Somebody needs to provide benchmarking looking at the compile-time cost
vs. the runtime benefit and the code size effect.  There's also room to tune
aggressiveness of BB vectorization as it currently allows for cases where
the scalar computation is not fully replaced by vector code.

Richard.

> 'Allan
>


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Richard Biener
On September 13, 2017 6:24:21 PM GMT+02:00, Jan Hubicka  wrote:
>> >I don't see static profile prediction to be very useful here to find
>> >"really
>> >hot code" - neither in current implementation or future. The problem
>of
>> >-O2 is that we kind of know that only 10% of code somewhere matters
>for
>> >performance but we have no way to reliably identify it.
>> 
>> It's hard to do better than statically look at (ipa) loop depth. But
>shouldn't that be good enough? 
>
>Only if you assume that you have whole program and understand indirect
>calls.
>There are some stats on this here
>http://ieeexplore.ieee.org/document/717399/
>
>It shows that propagating static profile across whole progrma (which is
>just
>tiny bit more fancy than counting loop depth) sort of work
>statistically.  I
>really do not have very high hopes of this reliably working in
>production
>compiler.  We already have PRs for single function benchmark where deep
>loop
>nest is used ininitialization or so and the actual hard working part
>has small
>loop nest & gets identified as cold.  
>
>As soon as you start propagating in whole program context, such local
>mistakes
>will become more comon.

Heh, I would just make loop nests hot without globally making anything cold 
because of that. Basically sth like optimistic ipa profile propagation. 

Richard. 

>> 
>> >
>> >I would make sense to have less agressive vectoriazaoitn at -O2 and
>> >more at
>> >-Ofast/-O3.
>> 
>> We tried that but the runtime effects were not offsetting the compile
>time cost. 
>
>Yep, i remember that.
>
>Honza



Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Jan Hubicka
> >I don't see static profile prediction to be very useful here to find
> >"really
> >hot code" - neither in current implementation or future. The problem of
> >-O2 is that we kind of know that only 10% of code somewhere matters for
> >performance but we have no way to reliably identify it.
> 
> It's hard to do better than statically look at (ipa) loop depth. But 
> shouldn't that be good enough? 

Only if you assume that you have whole program and understand indirect calls.
There are some stats on this here
http://ieeexplore.ieee.org/document/717399/

It shows that propagating static profile across whole progrma (which is just
tiny bit more fancy than counting loop depth) sort of work statistically.  I
really do not have very high hopes of this reliably working in production
compiler.  We already have PRs for single function benchmark where deep loop
nest is used ininitialization or so and the actual hard working part has small
loop nest & gets identified as cold.  

As soon as you start propagating in whole program context, such local mistakes
will become more comon.
> 
> >
> >I would make sense to have less agressive vectoriazaoitn at -O2 and
> >more at
> >-Ofast/-O3.
> 
> We tried that but the runtime effects were not offsetting the compile time 
> cost. 

Yep, i remember that.

Honza


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Richard Biener
On September 13, 2017 5:35:11 PM GMT+02:00, Jan Hubicka  wrote:
>> On Wed, Sep 13, 2017 at 3:46 PM, Jakub Jelinek 
>wrote:
>> > On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
>> >> On its own -O3 doesn't add much (some loop opts and slightly more
>> >> aggressive inlining/unrolling), so whatever it does we
>> >> should consider doing at -O2 eventually.
>> >
>> > Well, -O3 adds vectorization, which we don't enable at -O2 by
>default.
>> 
>> As said, -fprofile-use enables it so -O2 should eventually do the
>same
>> for "really hot code".
>
>I don't see static profile prediction to be very useful here to find
>"really
>hot code" - neither in current implementation or future. The problem of
>-O2 is that we kind of know that only 10% of code somewhere matters for
>performance but we have no way to reliably identify it.

It's hard to do better than statically look at (ipa) loop depth. But shouldn't 
that be good enough? 

>
>I would make sense to have less agressive vectoriazaoitn at -O2 and
>more at
>-Ofast/-O3.

We tried that but the runtime effects were not offsetting the compile time 
cost. 

>Adding -Os and -Oz would make sense to me - even with hot/cold info it
>is not
>desriable to optimize as agressively for size as we do becuase mistakes
>happen
>and one do not want to make code paths 1000 times slower to save one
>byte
>of binary.
>
>We could handle this gratefully internally by having logic for "known
>to be cold"
>and "guessed to be cold". New profile code can make difference in this.
>
>Honza
>> 
>> Richard.
>> 
>> > Jakub



Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Nikos Chantziaras

On 12/09/17 16:57, Wilco Dijkstra wrote:

[...] As a result users are
required to enable several additional optimizations by hand to get good code.
Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was
mentioned repeatedly) which GCC could/should do as well.
[...]

I'd welcome discussion and other proposals for similar improvements.


What's the status of graphite? It's been around for years. Isn't it 
mature enough to enable these:


-floop-interchange -ftree-loop-distribution -floop-strip-mine -floop-block

by default for -O2? (And I'm not even sure those are the complete set of 
graphite optimization flags, or just the "useful" ones.)




Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Jan Hubicka
> On Wed, Sep 13, 2017 at 3:21 AM, Michael Clark  wrote:
> >
> >> On 13 Sep 2017, at 1:15 PM, Michael Clark  wrote:
> >>
> >> - https://rv8.io/bench#optimisation
> >> - https://rv8.io/bench#executable-file-sizes
> >>
> >> -O2 is 98% perf of -O3 on x86-64
> >> -Os is 81% perf of -O3 on x86-64
> >>
> >> -O2 saves 5% space on -O3 on x86-64
> >> -Os saves 8% space on -Os on x86-64
> >>
> >> 17% drop in performance for 3% saving in space is not a good trade for a 
> >> “general” size optimisation. It’s more like executable compression.
> >
> > Sorry fixed typo:
> >
> > -O2 is 98% perf of -O3 on x86-64
> > -Os is 81% perf of -O3 on x86-64
> >
> > -O2 saves 5% space on -O3 on x86-64
> > -Os saves 8% space on -O3 on x86-64

I am bit surprised you see only 8% of code size difference
for -Os and -O3.  I look into these numbers occasionally and
it is usualy well over two digit number. 
http://hubicka.blogspot.cz/2014/04/linktime-optimization-in-gcc-2-firefox.html
http://hubicka.blogspot.cz/2014/09/linktime-optimization-in-gcc-part-3.html
has 42% code segment size reduction for Firefox and 19% for libreoffice

Honza


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Jan Hubicka
> On Wed, Sep 13, 2017 at 3:46 PM, Jakub Jelinek  wrote:
> > On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
> >> On its own -O3 doesn't add much (some loop opts and slightly more
> >> aggressive inlining/unrolling), so whatever it does we
> >> should consider doing at -O2 eventually.
> >
> > Well, -O3 adds vectorization, which we don't enable at -O2 by default.
> 
> As said, -fprofile-use enables it so -O2 should eventually do the same
> for "really hot code".

I don't see static profile prediction to be very useful here to find "really
hot code" - neither in current implementation or future. The problem of
-O2 is that we kind of know that only 10% of code somewhere matters for
performance but we have no way to reliably identify it.

I would make sense to have less agressive vectoriazaoitn at -O2 and more at
-Ofast/-O3.

Adding -Os and -Oz would make sense to me - even with hot/cold info it is not
desriable to optimize as agressively for size as we do becuase mistakes happen
and one do not want to make code paths 1000 times slower to save one byte
of binary.

We could handle this gratefully internally by having logic for "known to be 
cold"
and "guessed to be cold". New profile code can make difference in this.

Honza
> 
> Richard.
> 
> > Jakub


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Allan Sandfeld Jensen
On Mittwoch, 13. September 2017 15:46:09 CEST Jakub Jelinek wrote:
> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
> > On its own -O3 doesn't add much (some loop opts and slightly more
> > aggressive inlining/unrolling), so whatever it does we
> > should consider doing at -O2 eventually.
> 
> Well, -O3 adds vectorization, which we don't enable at -O2 by default.
> 
Would it be possible to enable basic block vectorization on -O2? I assume that 
doesn't increase binary size since it doesn't unroll loops.

'Allan



Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Allan Sandfeld Jensen
On Dienstag, 12. September 2017 23:27:22 CEST Michael Clark wrote:
> > On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra  wrote:
> > 
> > Hi all,
> > 
> > At the GNU Cauldron I was inspired by several interesting talks about
> > improving GCC in various ways. While GCC has many great optimizations, a
> > common theme is that its default settings are rather conservative. As a
> > result users are required to enable several additional optimizations by
> > hand to get good code. Other compilers enable more optimizations at -O2
> > (loop unrolling in LLVM was mentioned repeatedly) which GCC could/should
> > do as well.
> 
> There are some nuances to -O2. Please consider -O2 users who wish use it
> like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC).
> 
> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase
> code size can be skipped from -Os without drastically effecting
> performance.
> 
> This is not the case with GCC where -Os is a size at all costs optimisation
> mode. GCC users option for size not at the expense of speed is to use -O2.
> 
> Clang GCC
> -Oz   ~=  -Os
> -Os   ~=  -O2
> 
No. Clang's -Os is somewhat limited compared to gcc's, just like the clang -Og 
is just -O1. AFAIK -Oz is a proprietary Apple clang parameter, and not in 
clang proper.

'Allan


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Richard Biener
On Wed, Sep 13, 2017 at 3:46 PM, Jakub Jelinek  wrote:
> On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
>> On its own -O3 doesn't add much (some loop opts and slightly more
>> aggressive inlining/unrolling), so whatever it does we
>> should consider doing at -O2 eventually.
>
> Well, -O3 adds vectorization, which we don't enable at -O2 by default.

As said, -fprofile-use enables it so -O2 should eventually do the same
for "really hot code".

Richard.

> Jakub


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Jakub Jelinek
On Wed, Sep 13, 2017 at 03:41:19PM +0200, Richard Biener wrote:
> On its own -O3 doesn't add much (some loop opts and slightly more
> aggressive inlining/unrolling), so whatever it does we
> should consider doing at -O2 eventually.

Well, -O3 adds vectorization, which we don't enable at -O2 by default.

Jakub


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Kevin André
On Wed, Sep 13, 2017 at 9:43 AM, Janne Blomqvist
 wrote:
> On Tue, Sep 12, 2017 at 4:57 PM, Wilco Dijkstra  
> wrote:
>> These are just a few ideas to start. What do people think? I'd welcome 
>> discussion
>> and other proposals for similar improvements.
>
> What about the default behavior if no options are given? I think a
> more reasonable default would be something roughly like
>
> -O2 -Wall
>
> or if debuggability is considered more important that speed & size, maybe
>
> -Og -g -Wall

Enabling (some) warnings by default seems reasonable to me. Not sure
about the rest though.

This is something people can't seem to agree on. Some like warnings by
default, some like optimizations by default. Some are against warnings
by default, arguing that people like distro-builders have no need for
warnings (for example). Some are against optimizations by default,
because it would make compilation slower when they just want to check
if some piece of code compiles successfully or not (for example).

I think the only way to decide what options to enable by default is to
first decide who your target audience is going to be. Who do you
expect to run gcc with no options specified? If you ask me, that will
mostly be students or beginners. Those who are experienced are more
likely to use an automated build system where they specify build
options only once and then forget about it. An unexperienced person
would need warnings and debug info by default, and maybe a few simple
optimizations that do not interfere with debugging. There can only be
one set of default options, and which one you pick will depend on who
your target audience is. You cannot please everyone.

-- 
Kevin


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Richard Biener
On Wed, Sep 13, 2017 at 3:21 AM, Michael Clark  wrote:
>
>> On 13 Sep 2017, at 1:15 PM, Michael Clark  wrote:
>>
>> - https://rv8.io/bench#optimisation
>> - https://rv8.io/bench#executable-file-sizes
>>
>> -O2 is 98% perf of -O3 on x86-64
>> -Os is 81% perf of -O3 on x86-64
>>
>> -O2 saves 5% space on -O3 on x86-64
>> -Os saves 8% space on -Os on x86-64
>>
>> 17% drop in performance for 3% saving in space is not a good trade for a 
>> “general” size optimisation. It’s more like executable compression.
>
> Sorry fixed typo:
>
> -O2 is 98% perf of -O3 on x86-64
> -Os is 81% perf of -O3 on x86-64
>
> -O2 saves 5% space on -O3 on x86-64
> -Os saves 8% space on -O3 on x86-64
>
> The extra ~3% space saving for ~17% drop in performance doesn’t seem like a 
> good general option for size based on the cost in performance.
>
> Again. I really like GCC’s -O2 and hope that its binaries don’t grow in size 
> nor slow down.

I think with GCC -Os and -O2 are essentially the same with the
difference that -Os assumes regions are cold and thus to be
optimized for size and -O2 assumes they are hot and thus to be
optimized for speed in cases there is not heuristic proving
otherwise.  I know this doesn't 100% reflect implementation reality
but it should be close.

IMHO we should turn on flags we turn on with -fprofile-use and have
some more nuances in optimize_*_for_{speed,size} as we
now track profile quality more closely.

I see -O1 as mostly worthless unless you are compiling
machine-generated code that makes -O2+ go OOM/time.  Apart
from avoiding quadratic or worse algorithms -O1 sees no love.

On its own -O3 doesn't add much (some loop opts and slightly more
aggressive inlining/unrolling), so whatever it does we
should consider doing at -O2 eventually.

Richard.


Re: RFC: Improving GCC8 default option settings

2017-09-13 Thread Janne Blomqvist
On Tue, Sep 12, 2017 at 4:57 PM, Wilco Dijkstra  wrote:
> Hi all,
>
> At the GNU Cauldron I was inspired by several interesting talks about 
> improving
> GCC in various ways. While GCC has many great optimizations, a common theme is
> that its default settings are rather conservative. As a result users are
> required to enable several additional optimizations by hand to get good code.
> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was
> mentioned repeatedly) which GCC could/should do as well.
>
> Here are a few concrete proposals to improve GCC's option settings which will
> enable better code generation for most targets:
>
> * Make -fno-math-errno the default - this mostly affects the code generated 
> for
>   sqrt, which should be treated just like floating point division and not set
>   errno by default (unless you explicitly select C89 mode).

+1. Math functions setting errno is a blast from the past that needs
to die. That being said, this does to some extent depend on libm so
perhaps the default needs to be target-dependent.

> * Make -fno-trapping-math the default - another obvious one. From the docs:
>   "Compile code assuming that floating-point operations cannot generate
>user-visible traps."
>   There isn't a lot of code that actually uses user-visible traps (if any -
>   many CPUs don't even support user traps as it's an optional IEEE feature).
>   So assuming trapping math by default is way too conservative since there is
>   no obvious benefit to users.

As Mr. Myers explains, this is probably going a bit too far. I think
by default whatever fp optimizations are allowed with FENV_ACCESS off
is reasonable.

> * Make -fomit-frame-pointer the default - various targets already do this at
>   higher optimization levels, but this could easily be done for all targets.
>   Frame pointers haven't been needed for debugging for decades, however if 
> there
>   are still good reasons to keep it enabled with -O0 or -O1 (I can't think of 
> any
>   unless it is for last-resort backtrace when there is no unwind info at a 
> crash),
>   we could just disable the frame pointer from -O2 onwards.

Sounds reasonable.

> These are just a few ideas to start. What do people think? I'd welcome 
> discussion
> and other proposals for similar improvements.

What about the default behavior if no options are given? I think a
more reasonable default would be something roughly like

-O2 -Wall

or if debuggability is considered more important that speed & size, maybe

-Og -g -Wall

-- 
Janne Blomqvist


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Jeffrey Walton
> * Make -fomit-frame-pointer the default - various targets already do this at
>   higher optimization levels, but this could easily be done for all targets.
>   Frame pointers haven't been needed for debugging for decades, however if 
> there
>   are still good reasons to keep it enabled with -O0 or -O1 (I can't think of 
> any
>   unless it is for last-resort backtrace when there is no unwind info at a 
> crash),
>   we could just disable the frame pointer from -O2 onwards.

Given there's an -Og now, maybe frame pointers could be enabled fo -O0
and -Og, off by default otherwise.

I like to use -O1 to kick-in the analysis engine and start catching
warnings. It seems like -O1 should be closer -O2/-O3, with respect to
frame pointers since it could help find issues and tickle problems
with hand crafted ASM.

Jeff


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Michael Clark

> On 13 Sep 2017, at 1:15 PM, Michael Clark  wrote:
> 
> - https://rv8.io/bench#optimisation
> - https://rv8.io/bench#executable-file-sizes
> 
> -O2 is 98% perf of -O3 on x86-64
> -Os is 81% perf of -O3 on x86-64
> 
> -O2 saves 5% space on -O3 on x86-64
> -Os saves 8% space on -Os on x86-64
> 
> 17% drop in performance for 3% saving in space is not a good trade for a 
> “general” size optimisation. It’s more like executable compression.

Sorry fixed typo:

-O2 is 98% perf of -O3 on x86-64
-Os is 81% perf of -O3 on x86-64

-O2 saves 5% space on -O3 on x86-64
-Os saves 8% space on -O3 on x86-64

The extra ~3% space saving for ~17% drop in performance doesn’t seem like a 
good general option for size based on the cost in performance.

Again. I really like GCC’s -O2 and hope that its binaries don’t grow in size 
nor slow down.

Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Michael Clark

> On 13 Sep 2017, at 12:47 PM, Segher Boessenkool  
> wrote:
> 
> On Wed, Sep 13, 2017 at 09:27:22AM +1200, Michael Clark wrote:
>>> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was
>>> mentioned repeatedly) which GCC could/should do as well.
>> 
>> There are some nuances to -O2. Please consider -O2 users who wish use it 
>> like Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC).
>> 
>> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase 
>> code size can be skipped from -Os without drastically effecting performance.
>> 
>> This is not the case with GCC where -Os is a size at all costs optimisation 
>> mode. GCC users option for size not at the expense of speed is to use -O2.
> 
> "Size not at the expense of speed" exists in neither compiler.  Just the
> tradeoffs are different between GCC and LLVM.  It would be a silly
> optimisation target -- it's exactly the same as just "speed"!  Unless
> “speed" means "let's make it faster, and bigger just because" ;-)

I would like to be able to quantify stats on a well known benchmark suite, say 
SPECint 2006 or SPECint 2017 but in my own small benchmark suite I saw a 
disproportionate difference in size between -O2 and -Os, but a significant drop 
in performance with -O2 vs -Os.

- https://rv8.io/bench#optimisation
- https://rv8.io/bench#executable-file-sizes

-O2 is 98% perf of -O3 on x86-64
-Os is 81% perf of -O3 on x86-64

-O2 saves 5% space on -O3 on x86-64
-Os saves 8% space on -Os on x86-64

17% drop in performance for 3% saving in space is not a good trade for a 
“general” size optimisation. It’s more like executable compression.

-O2 seems to be a suite spot for size versus speed.

I could only recommend GCC’s -Os if the user is trying to squeeze something 
down to fit the last few bytes of a ROM and -Oz seems like a more appropriate 
name.

-O2 the current suite spot in GCC and is likely closest in semantics to 
LLVM/Clang -Os and I’d like -O2 binaries to stay lean.

I don’t think O2 should slow down nor should the binariesget larger. Turning up 
knobs that effect code size should be reserved for -O3 until GCC makes a 
distinction between -O2/-O2s and -Os/-Oz.

On RISC-V I believe we could shrink binaries at -O2 further with no sacrifice 
in performance, perhaps with a performance improvement by reducing icache 
bandwidth…

BTW -O2 gets great compression and performance improvements compared to -O0 ;-D 
it’s the points after -O2 where the trade offs don’t correlate.

I like -O2

My 2c.

> GCC's -Os is not "size at all costs" either; there are many options (mostly
> --params) that can decrease code size significantly.  To tune code size
> down for your particular program you have to play with options a bit.  This
> shouldn't be news to anyone.
> 
> '-Os'
> Optimize for size.  '-Os' enables all '-O2' optimizations that do
> not typically increase code size.  It also performs further
> optimizations designed to reduce code size.
> 
>> So if adding optimisations to -O2 that increase code size, please 
>> considering adding an -O2s that maintains the compact code size of -O2. -O2 
>> generates pretty compact code as many performance optimisations tend to 
>> reduce code size, or otherwise add optimisations that increase code size to 
>> -O3. Adding loop unrolling on makes sense in the Clang/LLVM context where 
>> they have a compact code model with good performance i.e. -Os. In GCC this 
>> is -O2.
>> 
>> So if you want to enable more optimisations at -O2, please copy -O2 
>> optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation 
>> defaults to a new -Os.
> 
> '-O2'
> Optimize even more.  GCC performs nearly all supported
> optimizations that do not involve a space-speed tradeoff.  As
> compared to '-O', this option increases both compilation time and
> the performance of the generated code.
> 
>> The present reality is that any project that wishes to optimize for size at 
>> all costs will need to run a configure test for -Oz, and then fall back to 
>> -Os, given the current disparity between Clang/LLVM and GCC flags here.
> 
> The present reality is that any project that wishes to support both GCC and
> LLVM needs to do configure tests, because LLVM chose to do many things
> differently (sometimes unavoidably).  If GCC would change some options
> to be more like LLVM, all users only ever using GCC would be affected,
> while all other incompatibilities would remain.  Not a good tradeoff at
> all.
> 
> 
> Segher



Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Segher Boessenkool
On Wed, Sep 13, 2017 at 09:27:22AM +1200, Michael Clark wrote:
> > Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was
> > mentioned repeatedly) which GCC could/should do as well.
> 
> There are some nuances to -O2. Please consider -O2 users who wish use it like 
> Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC).
> 
> Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase 
> code size can be skipped from -Os without drastically effecting performance.
> 
> This is not the case with GCC where -Os is a size at all costs optimisation 
> mode. GCC users option for size not at the expense of speed is to use -O2.

"Size not at the expense of speed" exists in neither compiler.  Just the
tradeoffs are different between GCC and LLVM.  It would be a silly
optimisation target -- it's exactly the same as just "speed"!  Unless
"speed" means "let's make it faster, and bigger just because" ;-)

GCC's -Os is not "size at all costs" either; there are many options (mostly
--params) that can decrease code size significantly.  To tune code size
down for your particular program you have to play with options a bit.  This
shouldn't be news to anyone.

'-Os'
 Optimize for size.  '-Os' enables all '-O2' optimizations that do
 not typically increase code size.  It also performs further
 optimizations designed to reduce code size.

> So if adding optimisations to -O2 that increase code size, please considering 
> adding an -O2s that maintains the compact code size of -O2. -O2 generates 
> pretty compact code as many performance optimisations tend to reduce code 
> size, or otherwise add optimisations that increase code size to -O3. Adding 
> loop unrolling on makes sense in the Clang/LLVM context where they have a 
> compact code model with good performance i.e. -Os. In GCC this is -O2.
> 
> So if you want to enable more optimisations at -O2, please copy -O2 
> optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation defaults 
> to a new -Os.

'-O2'
 Optimize even more.  GCC performs nearly all supported
 optimizations that do not involve a space-speed tradeoff.  As
 compared to '-O', this option increases both compilation time and
 the performance of the generated code.

> The present reality is that any project that wishes to optimize for size at 
> all costs will need to run a configure test for -Oz, and then fall back to 
> -Os, given the current disparity between Clang/LLVM and GCC flags here.

The present reality is that any project that wishes to support both GCC and
LLVM needs to do configure tests, because LLVM chose to do many things
differently (sometimes unavoidably).  If GCC would change some options
to be more like LLVM, all users only ever using GCC would be affected,
while all other incompatibilities would remain.  Not a good tradeoff at
all.


Segher


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Michael Clark

> On 13 Sep 2017, at 1:57 AM, Wilco Dijkstra  wrote:
> 
> Hi all,
> 
> At the GNU Cauldron I was inspired by several interesting talks about 
> improving
> GCC in various ways. While GCC has many great optimizations, a common theme is
> that its default settings are rather conservative. As a result users are 
> required to enable several additional optimizations by hand to get good code.
> Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was
> mentioned repeatedly) which GCC could/should do as well.

There are some nuances to -O2. Please consider -O2 users who wish use it like 
Clang/LLVM’s -Os (-O2 without loop vectorisation IIRC).

Clang/LLVM has an -Os that is like -O2 so adding optimisations that increase 
code size can be skipped from -Os without drastically effecting performance.

This is not the case with GCC where -Os is a size at all costs optimisation 
mode. GCC users option for size not at the expense of speed is to use -O2.

Clang   GCC
-Oz ~=  -Os
-Os ~=  -O2

So if adding optimisations to -O2 that increase code size, please considering 
adding an -O2s that maintains the compact code size of -O2. -O2 generates 
pretty compact code as many performance optimisations tend to reduce code size, 
or otherwise add optimisations that increase code size to -O3. Adding loop 
unrolling on makes sense in the Clang/LLVM context where they have a compact 
code model with good performance i.e. -Os. In GCC this is -O2.

So if you want to enable more optimisations at -O2, please copy -O2 
optimisations to -O2s or rename -Os to -Oz and copy -O2 optimisation defaults 
to a new -Os.

The present reality is that any project that wishes to optimize for size at all 
costs will need to run a configure test for -Oz, and then fall back to -Os, 
given the current disparity between Clang/LLVM and GCC flags here.

> Here are a few concrete proposals to improve GCC's option settings which will
> enable better code generation for most targets:
> 
> * Make -fno-math-errno the default - this mostly affects the code generated 
> for
>  sqrt, which should be treated just like floating point division and not set
>  errno by default (unless you explicitly select C89 mode).
> 
> * Make -fno-trapping-math the default - another obvious one. From the docs:
>  "Compile code assuming that floating-point operations cannot generate 
>   user-visible traps."
>  There isn't a lot of code that actually uses user-visible traps (if any -
>  many CPUs don't even support user traps as it's an optional IEEE feature). 
>  So assuming trapping math by default is way too conservative since there is
>  no obvious benefit to users. 
> 
> * Make -fno-common the default - this was originally needed for pre-ANSI C, 
> but
>  is optional in C (not sure whether it is still in C99/C11). This can
>  significantly improve code generation on targets that use anchors for globals
>  (note the linker could report a more helpful message when ancient code that
>  requires -fcommon fails to link).
> 
> * Make -fomit-frame-pointer the default - various targets already do this at
>  higher optimization levels, but this could easily be done for all targets.
>  Frame pointers haven't been needed for debugging for decades, however if 
> there
>  are still good reasons to keep it enabled with -O0 or -O1 (I can't think of 
> any
>  unless it is for last-resort backtrace when there is no unwind info at a 
> crash),
>  we could just disable the frame pointer from -O2 onwards.
> 
> These are just a few ideas to start. What do people think? I'd welcome 
> discussion
> and other proposals for similar improvements.
> 
> Wilco



Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Joseph Myers
On Tue, 12 Sep 2017, Alexander Monakov wrote:

> > * Make -fno-trapping-math the default - another obvious one. From the docs:
> >   "Compile code assuming that floating-point operations cannot generate 
> >user-visible traps."
> >   There isn't a lot of code that actually uses user-visible traps (if any -
> >   many CPUs don't even support user traps as it's an optional IEEE 
> > feature). 
> >   So assuming trapping math by default is way too conservative since there 
> > is
> >   no obvious benefit to users. 
> 
> OTOH -O options are understood to _never_ sacrifice standards compliance, with
> the exception of -Ofast.  I believe that's an important property to keep.

ISO C allows the FENV_ACCESS pragma to be OFF by default (we don't support 
the standard pragmas, but FENV_ACCESS ON is equivalent to a stricter 
version of -frounding-math -ftrapping-math).  Thus, this is not a 
standards compliance issue (unlike various parts of -ffast-math that break 
IEEE 754 semantics even with FENV_ACCESS OFF).  And since 
-fno-rounding-math is the default, the default is already a form of 
FENV_ACCESS OFF.

I don't think any -O implication of -fno-trapping-math was proposed; the 
proposal was about the default (independent of -O options).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Alexander Monakov
On Tue, 12 Sep 2017, Wilco Dijkstra wrote:
> * Make -fno-math-errno the default - this mostly affects the code generated 
> for
>   sqrt, which should be treated just like floating point division and not set
>   errno by default (unless you explicitly select C89 mode).

(note that this can be selectively enabled by targets where libm never sets
errno in the first place, docs call out Darwin as one such target, but musl-libc
targets have this property too)

> * Make -fno-trapping-math the default - another obvious one. From the docs:
>   "Compile code assuming that floating-point operations cannot generate 
>user-visible traps."
>   There isn't a lot of code that actually uses user-visible traps (if any -
>   many CPUs don't even support user traps as it's an optional IEEE feature). 
>   So assuming trapping math by default is way too conservative since there is
>   no obvious benefit to users. 

OTOH -O options are understood to _never_ sacrifice standards compliance, with
the exception of -Ofast.  I believe that's an important property to keep.

Maybe it's possible to treat -fno-trapping-math similar to -ffp-contract=fast,
i.e. implicitly enable it in the default C-with-GNU-extensions mode, keeping
strict-compliance mode (-std=c11 as opposed to gnu11) untouched?

In any case it shouldn't be hard to issue a warning if fenv.h functions are
used when -fno-trapping-math/-fno-rounding-math is enabled.

If the above doesn't fly, I believe adopting and promoting a single option for
non-value-changing math optimizations (-fno-math-errno -fno-trapping-math, plus
-fno-rounding-math -fno-signaling-nans when they're no longer default) would
be nice.

> * Make -fno-common the default - this was originally needed for pre-ANSI C, 
> but
>   is optional in C (not sure whether it is still in C99/C11). This can
>   significantly improve code generation on targets that use anchors for 
> globals
>   (note the linker could report a more helpful message when ancient code that
>   requires -fcommon fails to link).

I think in ISO C situations where -fcommon allows link to succeed fall under
undefined behavior, which in GNU toolchain is defined to match the historical
behavior.

I assume the main issue with this is the amount of legacy code that would cause
a link failure if -fno-common is made default - thus, is there anybody in
position to trigger a full-distro rebuild with gcc patched to enable
-fno-common, and compare before/after build failure stats?

Thanks.
Alexander


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Joseph Myers
On Tue, 12 Sep 2017, Wilco Dijkstra wrote:

> * Make -fno-math-errno the default - this mostly affects the code generated 
> for
>   sqrt, which should be treated just like floating point division and not set
>   errno by default (unless you explicitly select C89 mode).
> 
> * Make -fno-trapping-math the default - another obvious one. From the docs:

Note these would both have implications for library math_errhandling 
settings (since the compiler options can affect built-in functions).  In 
the absence of -ffast-math glibc defines it to (MATH_ERRNO | 
MATH_ERREXCEPT).  __NO_MATH_ERRNO__ exists since GCC 5 to allow it to be 
defined to just MATH_ERREXCEPT in the -fno-math-errno case, but the header 
needs updating to respect that.  And we don't have a macro for 
-fno-trapping-math to say whether MATH_ERREXCEPT should be part of the 
value.

My assumption is that with changed defaults glibc would need to compile 
with -ftrapping-math just as it uses -frounding-math; code may expect 
transformations that add exceptions not to occur.  It probably does not 
require -fmath-errno (glibc functions do not generally rely on other 
functions setting errno).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Theodore Papadopoulo
On 09/12/2017 05:32 PM, Andrew Pinski wrote:
>  .On Tue, Sep 12, 2017 at 8:29 AM, Theodore Papadopoulo
>  wrote:
>> Another one that might be interesting is -funsafe-loop-optimizations.
>> In most cases people write loops assuming simple finite loops (no
>> overflow). Crippling optimization for the small amount of people (system
>> programmers ?) that use such strange loops seems counterproductive. It
>> would be best if such loops can be marked with an attribute in some way
>> and that the general case just assumes that all loops are finite...
> 
> -funsafe-loop-optimizations is a nop in GCC 7 and above.
> Since https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00956.html .
> 
> Thanks,
> Andrew
> 

Thank's for the notice. For some reason, I missed that piece of
information... Too bad that making such an assumption generates bogus
code in some common cases.

Theo.


0x4F273D5D.asc
Description: application/pgp-keys


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Andrew Pinski
 .On Tue, Sep 12, 2017 at 8:29 AM, Theodore Papadopoulo
 wrote:
> Another one that might be interesting is -funsafe-loop-optimizations.
> In most cases people write loops assuming simple finite loops (no
> overflow). Crippling optimization for the small amount of people (system
> programmers ?) that use such strange loops seems counterproductive. It
> would be best if such loops can be marked with an attribute in some way
> and that the general case just assumes that all loops are finite...

-funsafe-loop-optimizations is a nop in GCC 7 and above.
Since https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00956.html .

Thanks,
Andrew


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Joseph Myers
On Tue, 12 Sep 2017, Wilco Dijkstra wrote:

> * Make -fno-trapping-math the default - another obvious one. From the docs:
>   "Compile code assuming that floating-point operations cannot generate 
>user-visible traps."
>   There isn't a lot of code that actually uses user-visible traps (if any -
>   many CPUs don't even support user traps as it's an optional IEEE feature). 
>   So assuming trapping math by default is way too conservative since there is
>   no obvious benefit to users. 

"traps" here means "raising IEEE exception flags" not just "invoking trap 
handlers".  That is, -ftrapping-math disables a range of local 
transformations that would change the set of flags raised by an operation.  
(Transformations that change the nonzero number of times a flag is raised 
to a different nonzero number are always OK; that is, the possibility of a 
trap handler counting how many times it is invoked is never considered.  
Transformations that might move flag raising across function calls or asms 
that might inspect or modify the flags should not be OK, at least with a 
stricter version of -ftrapping-math that might be another option, but we 
don't have that stricter version at present; -ftrapping-math generally 
does not disable code movement, or removal of code that is dead apart from 
its effect on exception flags.)

That is, lack of trap support on processors that only support exception 
flags is not relevant to -ftrapping-math, beyond any question of whether 
-ftrapping-math should disable transformations that only affect whether an 
exact underflow exception occurs (the case where default exception 
handling does not raise the flag), if we have any such transformations 
(constant folding on exact underflow?).

It's true that a stricter version of -ftrapping-math that inhibits code 
movement and removal would probably inhibit *more* optimizations than 
-frounding-math (which is off by default), as -frounding-math only makes 
floating-point operations read thread-local state but -ftrapping-math 
makes them write it as well.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: RFC: Improving GCC8 default option settings

2017-09-12 Thread Theodore Papadopoulo
Another one that might be interesting is -funsafe-loop-optimizations.
In most cases people write loops assuming simple finite loops (no
overflow). Crippling optimization for the small amount of people (system
programmers ?) that use such strange loops seems counterproductive. It
would be best if such loops can be marked with an attribute in some way
and that the general case just assumes that all loops are finite...


0x4F273D5D.asc
Description: application/pgp-keys


RFC: Improving GCC8 default option settings

2017-09-12 Thread Wilco Dijkstra
Hi all,

At the GNU Cauldron I was inspired by several interesting talks about improving
GCC in various ways. While GCC has many great optimizations, a common theme is
that its default settings are rather conservative. As a result users are 
required to enable several additional optimizations by hand to get good code.
Other compilers enable more optimizations at -O2 (loop unrolling in LLVM was
mentioned repeatedly) which GCC could/should do as well.

Here are a few concrete proposals to improve GCC's option settings which will
enable better code generation for most targets:

* Make -fno-math-errno the default - this mostly affects the code generated for
  sqrt, which should be treated just like floating point division and not set
  errno by default (unless you explicitly select C89 mode).

* Make -fno-trapping-math the default - another obvious one. From the docs:
  "Compile code assuming that floating-point operations cannot generate 
   user-visible traps."
  There isn't a lot of code that actually uses user-visible traps (if any -
  many CPUs don't even support user traps as it's an optional IEEE feature). 
  So assuming trapping math by default is way too conservative since there is
  no obvious benefit to users. 

* Make -fno-common the default - this was originally needed for pre-ANSI C, but
  is optional in C (not sure whether it is still in C99/C11). This can
  significantly improve code generation on targets that use anchors for globals
  (note the linker could report a more helpful message when ancient code that
  requires -fcommon fails to link).

* Make -fomit-frame-pointer the default - various targets already do this at
  higher optimization levels, but this could easily be done for all targets.
  Frame pointers haven't been needed for debugging for decades, however if there
  are still good reasons to keep it enabled with -O0 or -O1 (I can't think of 
any
  unless it is for last-resort backtrace when there is no unwind info at a 
crash),
  we could just disable the frame pointer from -O2 onwards.

These are just a few ideas to start. What do people think? I'd welcome 
discussion
and other proposals for similar improvements.

Wilco