Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2020-03-13 Thread David Malcolm via Gcc-patches
On Thu, 2020-03-12 at 14:21 -0600, Sandra Loosemore wrote:
> On 1/27/20 2:02 PM, Jeff Law wrote:
> > [snip]
> > 
> > While I think we've missed the boat for gcc-10, I think these
> > patches 
> > should go forward in gcc-11. I'll own getting the paths sorted so
> > that 
> > this problem is avoided.

> I recently retested these patches on trunk,

For reference, the patches are here:
 cover letter: 
https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534142.html
 [1/2] https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534145.html
 [2/2] https://gcc.gnu.org/pipermail/gcc-patches/2019-November/534144.html

> and I found some new 
> regressions in the analyzer tests.
> 
> FAIL: default/gcc.sum:gcc.dg/analyzer/edges-1.c  (test for bogus 
> messages, line 16)
> 
> This one may be a genuine analyzer bug?  It is apparently failing to 
> prune the control flow paths per commit 
> 004f2c07b6d3fa543f0fe86c55a7b3c227de2bb6.

I think that what's happened is that your patch improves the location
information for the gimple stmts, and this uncovers a bug in how I wrote
the test.

Without your patch 2, edges-1.c's test_1 has this output:

  edges-1.c:19:6: warning: leak of FILE 'fp' [CWE-775] [-Wanalyzer-file-leak]
  edges-1.c:10:14: note: (1) opened here
  edges-1.c:12:6: note: (2) assuming 'fp' is non-NULL
  edges-1.c:12:6: note: (3) following 'false' branch (when 'fp' is non-NULL)...
  cc1: note: (4) ...to here
  edges-1.c:19:6: note: (5) following 'false' branch (when 'flag == 0')...
  cc1: note: (6) ...to here
  edges-1.c:19:6: note: (7) 'fp' leaks here; was opened at (1)

With both of your patches I get:

  edges-1.c:19:6: warning: leak of FILE 'fp' [CWE-775] [-Wanalyzer-file-leak]
  edges-1.c:10:14: note: (1) opened here
  edges-1.c:12:6: note: (2) assuming 'fp' is non-NULL
  edges-1.c:12:6: note: (3) following 'false' branch (when 'fp' is non-NULL)...
  edges-1.c:16:10: note: (4) ...to here
  edges-1.c:19:6: note: (5) following 'false' branch (when 'flag == 0')...
  cc1: note: (6) ...to here
  edges-1.c:19:6: note: (7) 'fp' leaks here; was opened at (1)

Note how the patch improves event (4) in the path from being at an
unknown location to being at line 16 (the "while").

I think I underspecified the dg-bogus at line 16: my intent is for it to
verify that we don't emit any events relating to the "while" test, but
the above now has the "...to here".

The following patch fixes that dg-bogus at line 16; the analyzer is
still correctly not reporting the control flow for the while-loop
itself, as that's not needed to trigger the leak (I verified that the
updated dg-bogus is still triggered if I supply -fanalyzer-verbosity=3
to disable the purging of redundant control-flow).

Note how event (6) still doesn't have a location; that's the destination
of the final "if" that's at the end of the function (falling off the
end).  That may well be an issue with the analyzer, though, not your
code.

> PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c  (test for 
> warnings, line 39)
> PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c  (test for 
> warnings, line 46)
> PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c (test for 
> excess errors)
> 
> This one is hitting the "--Wanalyzer-too-complex" diagnostic and
> could 
> probably be fixed by adjusting the parameters at the top of the
> testcase.

I tried doubling the limits but it didn't help; something unexpected
seems to be going on - but I'm in the middle of a rewrite of this code,
so I'd prefer to postpone looking into this for now.

> PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c  (test for 
> warnings, line 28)
> PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c  (test for 
> warnings, line 37)
> PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c (test for
> excess 
> errors)
> 
> It's now printing "2 processed enodes" in both warnings instead of 3
> or 
> 4 (respectively).  At first glance, it's not obvious what the 
> significance of the the things being tested here is, so I'm not sure
> if 
> this is an analyzer bug or just patterns that are too fragile.

The test is trying to measure how many nodes we create at each measured
program point in the exploded_graph - ideally the fewer the better; it
seems that the patch improves the analyzer here, and the test should be
updated to reflect that.

Here's a lightly-tested patch which fixes the edges-1.c and loop-4.c
failures for me.

David

gcc/testsuite/ChangeLog:
* gcc.dg/analyzer/edges-1.c: Strengthen the dg-bogus directives to
check for the word "branch" rather than any message, since
test_1's if's "...to here" message now has the location of the
"while" stmt.
* gcc.dg/analyzer/loop-4.c: Reduce number of expected enodes in
two places.
---
 gcc/testsuite/gcc.dg/analyzer/edges-1.c | 4 ++--
 gcc/testsuite/gcc.dg/analyzer/loop-4.c  | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git 

Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2020-03-12 Thread Sandra Loosemore

On 1/27/20 2:02 PM, Jeff Law wrote:


[snip]

While I think we've missed the boat for gcc-10, I think these patches 
should go forward in gcc-11. I'll own getting the paths sorted so that 
this problem is avoided.


I recently retested these patches on trunk, and I found some new 
regressions in the analyzer tests.


FAIL: default/gcc.sum:gcc.dg/analyzer/edges-1.c  (test for bogus 
messages, line 16)


This one may be a genuine analyzer bug?  It is apparently failing to 
prune the control flow paths per commit 
004f2c07b6d3fa543f0fe86c55a7b3c227de2bb6.


PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c  (test for 
warnings, line 39)
PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c  (test for 
warnings, line 46)
PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/explode-2.c (test for 
excess errors)


This one is hitting the "--Wanalyzer-too-complex" diagnostic and could 
probably be fixed by adjusting the parameters at the top of the testcase.


PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c  (test for 
warnings, line 28)
PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c  (test for 
warnings, line 37)
PASS -> FAIL: default/gcc.sum:gcc.dg/analyzer/loop-4.c (test for excess 
errors)


It's now printing "2 processed enodes" in both warnings instead of 3 or 
4 (respectively).  At first glance, it's not obvious what the 
significance of the the things being tested here is, so I'm not sure if 
this is an analyzer bug or just patterns that are too fragile.


David, WDYT?

-Sandra


Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2020-01-27 Thread Jeff Law
On Thu, 2019-12-12 at 15:44 -0500, Jason Merrill wrote:
> Here are the dumps from ssa-dom-thread-7.c made to compile as C++; cx-current 
> is the dumps with current trunk; cx-old is changed to use the old goto-based 
> lowering like C.
Sorry this has taken so long to get back to.

For ssa-dom-thread-7.c it looks like the differences we're encountering start 
at the thread1 pass.

While both cc1 and cc1plus optimized 16 jump threading paths, the final targets 
differ in some cases.  I guess somewhat ironically cc1plus actually does a 
better job threading deeper through the CFG.   I suspect, but have not actually 
confirmed that by threading deeper through the CFG, there's just fewer things 
for subsequent passes to detect and optimize.

I'm pretty sure cc1 doesn't thread as deeply simply due to the ordering of the 
jump thread paths that have been recorded.  Essentially we only optimize *one* 
path starting at any given edge even though we may have multiple potential jump 
threading paths that start at that edge.   This clearly argues that we should 
sort the vector of jump threading paths so that we find the longest paths first.

While I think we've missed the boat for gcc-10, I think these patches should go 
forward in gcc-11.  I'll own getting the paths sorted so that this problem is 
avoided.

Jeff





Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2020-01-13 Thread Jeff Law
On Mon, 2020-01-13 at 13:32 -0500, Jason Merrill wrote:
> On 12/12/19 3:44 PM, Jason Merrill wrote:
> > On Wed, Dec 11, 2019 at 1:37 PM Jeff Law  > > wrote:
> > On Wed, 2019-12-11 at 00:03 -0700, Sandra Loosemore wrote:
> >  > On 12/6/19 3:41 PM, Jeff Law wrote:
> >  > > On Wed, 2019-11-13 at 09:27 -0700, Sandra Loosemore wrote:
> >  > > > I bootstrapped and regression-tested this on x86_64-linux-
> >  > > > gnu.  There
> >  > > > are a few regressions involving these tests:
> >  > > >
> >  > > > gcc.dg/tree-ssa/pr77445-2.c
> >  > > I believe tihs is another instance of the FSA optimization.  I'd
> >  > > need
> >  > > to see the before/after dumps to know if it's regressed.  The main
> >  > > purpose of the test was to verify that we didn't muck up the
> >  > > profile
> >  > > estimation after the FSA optimization.
> >  > >
> >  > >
> >  > > > gcc.dg/tree-ssa/ssa-dce-3.c
> >  > > So I think this one is supposed to collapse into a trivial infinite
> >  > > loop.  Anything else would be a regression.
> >  > >
> >  > >
> >  > > > gcc.dg/tree-ssa/ssa-dom-thread-7.c
> >  > > This is the FSA optimization.  Unfortunately checking for it being
> >  > > done
> >  > > right is exceedingly painful.  If you pass along the before/after
> >  > > dumps
> >  > > I can probably help determine whether or not we just need an update
> >  > > to
> >  > > the scanned bits.
> >  > >
> >  > > Given how much pressure there was to handle the FSA optimization,
> >  > > I'd
> >  > > prefer to make sure we're still doing it before moving forward.
> >  >
> >  > I poked at these 3 test cases some more.  FWIW, it appears that if
> >  > you
> >  > use an unmodified build to compile them as C++ instead of C, the
> >  > same
> >  > problems appear.  So I guess it is an existing bug in
> >  > something-or-another that we are not presently optimizing code output
> >  > by
> >  > the C++ front end as well as that from the C front end.  :-S  (At
> >  > least,
> >  > the ssa-dce-3.c optimization failure is a bug; the other two might
> >  > be
> >  > fragile test cases.)
> >  >
> >  > The C++ front end used to lower loops in exactly the same way as the
> >  > C
> >  > front end.  This is the patch that changed it:
> >  >
> >  > https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01994.html
> >  >
> >  > There wasn't much discussion about how this affected optimization
> >  > beyond
> >  > noting a slight decrease in code size for a single benchmark, and no
> >  > consideration at all of whether it wouldn't be better to have the C
> >  > and
> >  > C++ front ends use the same lowering strategy for loops, whichever
> >  > way
> >  > produced better code, so that the optimizers can better recognize
> >  > the
> >  > common patterns from both languages.
> >  >
> >  > Anyway, I'm no longer expecting to get this front end patch into GCC
> >  > 10,
> >  > but it would be helpful to get some guidance on how to proceed for
> >  > resubmission when stage 1 re-opens.  E.g. from where I'm sitting
> >  > now,
> >  > fixing the C++ constexpr evaluator to handle gotos (if it doesn't
> >  > already?) and reverting to the C way of lowering loops seems like a
> >  > much
> >  > more bounded problem than fixing optimizers and trying to benchmark
> >  > their effectiveness.  :-S  OTOH, somebody more familiar with these
> >  > optimizations might see easy fixes.  Or I could re-jigger my patches
> >  > to
> >  > continue to use different loop lowering strategies for C and C++ so
> >  > it
> >  > at least wouldn't make things any worse for either language than it
> >  > already is.
> > Well, I'm happy to look at the before/after dumps if you pass them
> > along.   I'd certainly like to see the two front-ends sharing these
> > bits.
> > 
> > Here are the dumps from ssa-dom-thread-7.c made to compile as C++; 
> > cx-current is the dumps with current trunk; cx-old is changed to use the 
> > old goto-based lowering like C.
> 
> Have you had a chance to look at these at all?  Does it make sense to 
> check in my patch to revert C++ loop lowering to be like C again?
Sorry, I haven't.  Too much time prepping Fedora for LTO...

jeff



Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2020-01-13 Thread Jason Merrill

On 12/12/19 3:44 PM, Jason Merrill wrote:
On Wed, Dec 11, 2019 at 1:37 PM Jeff Law > wrote:

On Wed, 2019-12-11 at 00:03 -0700, Sandra Loosemore wrote:
 > On 12/6/19 3:41 PM, Jeff Law wrote:
 > > On Wed, 2019-11-13 at 09:27 -0700, Sandra Loosemore wrote:
 > > > I bootstrapped and regression-tested this on x86_64-linux-
 > > > gnu.  There
 > > > are a few regressions involving these tests:
 > > >
 > > > gcc.dg/tree-ssa/pr77445-2.c
 > > I believe tihs is another instance of the FSA optimization.  I'd
 > > need
 > > to see the before/after dumps to know if it's regressed.  The main
 > > purpose of the test was to verify that we didn't muck up the
 > > profile
 > > estimation after the FSA optimization.
 > >
 > >
 > > > gcc.dg/tree-ssa/ssa-dce-3.c
 > > So I think this one is supposed to collapse into a trivial infinite
 > > loop.  Anything else would be a regression.
 > >
 > >
 > > > gcc.dg/tree-ssa/ssa-dom-thread-7.c
 > > This is the FSA optimization.  Unfortunately checking for it being
 > > done
 > > right is exceedingly painful.  If you pass along the before/after
 > > dumps
 > > I can probably help determine whether or not we just need an update
 > > to
 > > the scanned bits.
 > >
 > > Given how much pressure there was to handle the FSA optimization,
 > > I'd
 > > prefer to make sure we're still doing it before moving forward.
 >
 > I poked at these 3 test cases some more.  FWIW, it appears that if
 > you
 > use an unmodified build to compile them as C++ instead of C, the
 > same
 > problems appear.  So I guess it is an existing bug in
 > something-or-another that we are not presently optimizing code output
 > by
 > the C++ front end as well as that from the C front end.  :-S  (At
 > least,
 > the ssa-dce-3.c optimization failure is a bug; the other two might
 > be
 > fragile test cases.)
 >
 > The C++ front end used to lower loops in exactly the same way as the
 > C
 > front end.  This is the patch that changed it:
 >
 > https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01994.html
 >
 > There wasn't much discussion about how this affected optimization
 > beyond
 > noting a slight decrease in code size for a single benchmark, and no
 > consideration at all of whether it wouldn't be better to have the C
 > and
 > C++ front ends use the same lowering strategy for loops, whichever
 > way
 > produced better code, so that the optimizers can better recognize
 > the
 > common patterns from both languages.
 >
 > Anyway, I'm no longer expecting to get this front end patch into GCC
 > 10,
 > but it would be helpful to get some guidance on how to proceed for
 > resubmission when stage 1 re-opens.  E.g. from where I'm sitting
 > now,
 > fixing the C++ constexpr evaluator to handle gotos (if it doesn't
 > already?) and reverting to the C way of lowering loops seems like a
 > much
 > more bounded problem than fixing optimizers and trying to benchmark
 > their effectiveness.  :-S  OTOH, somebody more familiar with these
 > optimizations might see easy fixes.  Or I could re-jigger my patches
 > to
 > continue to use different loop lowering strategies for C and C++ so
 > it
 > at least wouldn't make things any worse for either language than it
 > already is.
Well, I'm happy to look at the before/after dumps if you pass them
along.   I'd certainly like to see the two front-ends sharing these
bits.

Here are the dumps from ssa-dom-thread-7.c made to compile as C++; 
cx-current is the dumps with current trunk; cx-old is changed to use the 
old goto-based lowering like C.


Have you had a chance to look at these at all?  Does it make sense to 
check in my patch to revert C++ loop lowering to be like C again?


Jason



Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-12-11 Thread Sandra Loosemore

On 12/11/19 11:27 AM, Jason Merrill wrote:

On 12/11/19 2:03 AM, Sandra Loosemore wrote:


[snip]

Anyway, I'm no longer expecting to get this front end patch into GCC 
10, but it would be helpful to get some guidance on how to proceed for 
resubmission when stage 1 re-opens.  E.g. from where I'm sitting now, 
fixing the C++ constexpr evaluator to handle gotos (if it doesn't 
already?) and reverting to the C way of lowering loops seems like a 
much more bounded problem than fixing optimizers and trying to 
benchmark their effectiveness.  :-S  OTOH, somebody more familiar with 
these optimizations might see easy fixes.  Or I could re-jigger my 
patches to continue to use different loop lowering strategies for C 
and C++ so it at least wouldn't make things any worse for either 
language than it already is.


If this is an important optimization, it would certainly be good to make 
it work for C++.  And now that constexpr works on the pre-generic form 
of the function, it doesn't care how we lower loops.  So C++ could 
revert to the C way without much trouble.  


Thanks for confirming my suspicion that it ought to "just work" to do 
that.  Unless we're concerned about papering over optimizer bugs, this 
would be an easy fix.


I just find it weird that 
apparently the middle end still can't optimize LOOP_EXPR properly.


Yeah, this seems weird to me too; you'd think it would all be the same 
once it gets converted to a control flow graph, and it would not matter 
whether the end test is located at the top or bottom of the loop, etc.


-Sandra


Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-12-11 Thread Jeff Law
On Wed, 2019-12-11 at 00:03 -0700, Sandra Loosemore wrote:
> On 12/6/19 3:41 PM, Jeff Law wrote:
> > On Wed, 2019-11-13 at 09:27 -0700, Sandra Loosemore wrote:
> > > I bootstrapped and regression-tested this on x86_64-linux-
> > > gnu.  There
> > > are a few regressions involving these tests:
> > > 
> > > gcc.dg/tree-ssa/pr77445-2.c
> > I believe tihs is another instance of the FSA optimization.  I'd
> > need
> > to see the before/after dumps to know if it's regressed.  The main
> > purpose of the test was to verify that we didn't muck up the
> > profile
> > estimation after the FSA optimization.
> > 
> > 
> > > gcc.dg/tree-ssa/ssa-dce-3.c
> > So I think this one is supposed to collapse into a trivial infinite
> > loop.  Anything else would be a regression.
> > 
> > 
> > > gcc.dg/tree-ssa/ssa-dom-thread-7.c
> > This is the FSA optimization.  Unfortunately checking for it being
> > done
> > right is exceedingly painful.  If you pass along the before/after
> > dumps
> > I can probably help determine whether or not we just need an update
> > to
> > the scanned bits.
> > 
> > Given how much pressure there was to handle the FSA optimization,
> > I'd
> > prefer to make sure we're still doing it before moving forward.
> 
> I poked at these 3 test cases some more.  FWIW, it appears that if
> you 
> use an unmodified build to compile them as C++ instead of C, the
> same 
> problems appear.  So I guess it is an existing bug in 
> something-or-another that we are not presently optimizing code output
> by 
> the C++ front end as well as that from the C front end.  :-S  (At
> least, 
> the ssa-dce-3.c optimization failure is a bug; the other two might
> be 
> fragile test cases.)
> 
> The C++ front end used to lower loops in exactly the same way as the
> C 
> front end.  This is the patch that changed it:
> 
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01994.html
> 
> There wasn't much discussion about how this affected optimization
> beyond 
> noting a slight decrease in code size for a single benchmark, and no 
> consideration at all of whether it wouldn't be better to have the C
> and 
> C++ front ends use the same lowering strategy for loops, whichever
> way 
> produced better code, so that the optimizers can better recognize
> the 
> common patterns from both languages.
> 
> Anyway, I'm no longer expecting to get this front end patch into GCC
> 10, 
> but it would be helpful to get some guidance on how to proceed for 
> resubmission when stage 1 re-opens.  E.g. from where I'm sitting
> now, 
> fixing the C++ constexpr evaluator to handle gotos (if it doesn't 
> already?) and reverting to the C way of lowering loops seems like a
> much 
> more bounded problem than fixing optimizers and trying to benchmark 
> their effectiveness.  :-S  OTOH, somebody more familiar with these 
> optimizations might see easy fixes.  Or I could re-jigger my patches
> to 
> continue to use different loop lowering strategies for C and C++ so
> it 
> at least wouldn't make things any worse for either language than it 
> already is.
Well, I'm happy to look at the before/after dumps if you pass them
along.   I'd certainly like to see the two front-ends sharing these
bits.



Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-12-11 Thread Jason Merrill

On 12/11/19 2:03 AM, Sandra Loosemore wrote:

On 12/6/19 3:41 PM, Jeff Law wrote:

On Wed, 2019-11-13 at 09:27 -0700, Sandra Loosemore wrote:


I bootstrapped and regression-tested this on x86_64-linux-gnu.  There
are a few regressions involving these tests:

gcc.dg/tree-ssa/pr77445-2.c

I believe tihs is another instance of the FSA optimization.  I'd need
to see the before/after dumps to know if it's regressed.  The main
purpose of the test was to verify that we didn't muck up the profile
estimation after the FSA optimization.



gcc.dg/tree-ssa/ssa-dce-3.c

So I think this one is supposed to collapse into a trivial infinite
loop.  Anything else would be a regression.



gcc.dg/tree-ssa/ssa-dom-thread-7.c

This is the FSA optimization.  Unfortunately checking for it being done
right is exceedingly painful.  If you pass along the before/after dumps
I can probably help determine whether or not we just need an update to
the scanned bits.

Given how much pressure there was to handle the FSA optimization, I'd
prefer to make sure we're still doing it before moving forward.


I poked at these 3 test cases some more.  FWIW, it appears that if you 
use an unmodified build to compile them as C++ instead of C, the same 
problems appear.  So I guess it is an existing bug in 
something-or-another that we are not presently optimizing code output by 
the C++ front end as well as that from the C front end.  :-S  (At least, 
the ssa-dce-3.c optimization failure is a bug; the other two might be 
fragile test cases.)


The C++ front end used to lower loops in exactly the same way as the C 
front end.  This is the patch that changed it:


https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01994.html

There wasn't much discussion about how this affected optimization beyond 
noting a slight decrease in code size for a single benchmark, and no 
consideration at all of whether it wouldn't be better to have the C and 
C++ front ends use the same lowering strategy for loops, whichever way 
produced better code, so that the optimizers can better recognize the 
common patterns from both languages.


Anyway, I'm no longer expecting to get this front end patch into GCC 10, 
but it would be helpful to get some guidance on how to proceed for 
resubmission when stage 1 re-opens.  E.g. from where I'm sitting now, 
fixing the C++ constexpr evaluator to handle gotos (if it doesn't 
already?) and reverting to the C way of lowering loops seems like a much 
more bounded problem than fixing optimizers and trying to benchmark 
their effectiveness.  :-S  OTOH, somebody more familiar with these 
optimizations might see easy fixes.  Or I could re-jigger my patches to 
continue to use different loop lowering strategies for C and C++ so it 
at least wouldn't make things any worse for either language than it 
already is.


If this is an important optimization, it would certainly be good to make 
it work for C++.  And now that constexpr works on the pre-generic form 
of the function, it doesn't care how we lower loops.  So C++ could 
revert to the C way without much trouble.  I just find it weird that 
apparently the middle end still can't optimize LOOP_EXPR properly.


Jason



Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-12-10 Thread Sandra Loosemore

On 12/6/19 3:41 PM, Jeff Law wrote:

On Wed, 2019-11-13 at 09:27 -0700, Sandra Loosemore wrote:


I bootstrapped and regression-tested this on x86_64-linux-gnu.  There
are a few regressions involving these tests:

gcc.dg/tree-ssa/pr77445-2.c

I believe tihs is another instance of the FSA optimization.  I'd need
to see the before/after dumps to know if it's regressed.  The main
purpose of the test was to verify that we didn't muck up the profile
estimation after the FSA optimization.



gcc.dg/tree-ssa/ssa-dce-3.c

So I think this one is supposed to collapse into a trivial infinite
loop.  Anything else would be a regression.



gcc.dg/tree-ssa/ssa-dom-thread-7.c

This is the FSA optimization.  Unfortunately checking for it being done
right is exceedingly painful.  If you pass along the before/after dumps
I can probably help determine whether or not we just need an update to
the scanned bits.

Given how much pressure there was to handle the FSA optimization, I'd
prefer to make sure we're still doing it before moving forward.


I poked at these 3 test cases some more.  FWIW, it appears that if you 
use an unmodified build to compile them as C++ instead of C, the same 
problems appear.  So I guess it is an existing bug in 
something-or-another that we are not presently optimizing code output by 
the C++ front end as well as that from the C front end.  :-S  (At least, 
the ssa-dce-3.c optimization failure is a bug; the other two might be 
fragile test cases.)


The C++ front end used to lower loops in exactly the same way as the C 
front end.  This is the patch that changed it:


https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01994.html

There wasn't much discussion about how this affected optimization beyond 
noting a slight decrease in code size for a single benchmark, and no 
consideration at all of whether it wouldn't be better to have the C and 
C++ front ends use the same lowering strategy for loops, whichever way 
produced better code, so that the optimizers can better recognize the 
common patterns from both languages.


Anyway, I'm no longer expecting to get this front end patch into GCC 10, 
but it would be helpful to get some guidance on how to proceed for 
resubmission when stage 1 re-opens.  E.g. from where I'm sitting now, 
fixing the C++ constexpr evaluator to handle gotos (if it doesn't 
already?) and reverting to the C way of lowering loops seems like a much 
more bounded problem than fixing optimizers and trying to benchmark 
their effectiveness.  :-S  OTOH, somebody more familiar with these 
optimizations might see easy fixes.  Or I could re-jigger my patches to 
continue to use different loop lowering strategies for C and C++ so it 
at least wouldn't make things any worse for either language than it 
already is.


-Sandra


Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-12-06 Thread Jeff Law
On Wed, 2019-11-13 at 09:27 -0700, Sandra Loosemore wrote:
> 
> I bootstrapped and regression-tested this on x86_64-linux-gnu.  There
> are a few regressions involving these tests:
> 
> gcc.dg/tree-ssa/pr77445-2.c
I believe tihs is another instance of the FSA optimization.  I'd need
to see the before/after dumps to know if it's regressed.  The main
purpose of the test was to verify that we didn't muck up the profile
estimation after the FSA optimization.


> gcc.dg/tree-ssa/ssa-dce-3.c
So I think this one is supposed to collapse into a trivial infinite
loop.  Anything else would be a regression.


> gcc.dg/tree-ssa/ssa-dom-thread-7.c
This is the FSA optimization.  Unfortunately checking for it being done
right is exceedingly painful.  If you pass along the before/after dumps
I can probably help determine whether or not we just need an update to
the scanned bits.

Given how much pressure there was to handle the FSA optimization, I'd
prefer to make sure we're still doing it before moving forward.

jeff



Re: [PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-12-01 Thread Sandra Loosemore

Ping!

On 11/13/19 9:27 AM, Sandra Loosemore wrote:

This patch series lays some groundwork for the project to redo the
OpenACC "kernels" region support in GCC, described in Thomas
Schwinge's recent talk at the GNU Cauldron:

https://gcc.gnu.org/wiki/cauldron2019talks?action=AttachFile=view=OpenACC+kernels-cauldron2019.pdf

Briefly, the larger goal is to make the compiler recognize "for" loops
that are candidates for parallelization, and treat them as if they
were annotated with "#pragma acc loop auto".  This is fairly
straightforward to do on the output of the Fortran and C++ front ends,
which both produce a high-level parse tree representation of the loop.
OTOH, the C front end currently lowers loops into a goto form very
early, making it hard both to recognize "for" loops and to
differentiate between (valid) structured and (invalid) unstructured
control flow in the body of the loop.

So, the immediate goal of this patch sequence is to make the C front
end produce output using the same high-level tree representation as
the C++ front end already emits: specifically, to produce FOR_STMT,
WHILE_STMT, and DO_STMT for loops, BREAK_STMT and CONTINUE_STMT
instead of gotos, and also SWITCH_STMT instead of SWITCH_EXPR since
that's mixed up in the handling for "break".  Besides this being
helpful to OpenACC implementation, there's also some argument to be
made that sharing more code between the respective C family front ends
is good from an engineering perspective, etc.

Part 1 of the patch set moves the definitions and support
(genericization, pretty-printers, dumpers, etc) for those tree nodes
out of the cp/ front end and into the common c-family/ code.  Part 2
hacks up the C front end to emit the now-shared data structures.

I bootstrapped and regression-tested this on x86_64-linux-gnu.  There
are a few regressions involving these tests:

gcc.dg/tree-ssa/pr77445-2.c
gcc.dg/tree-ssa/ssa-dce-3.c
gcc.dg/tree-ssa/ssa-dom-thread-7.c

I looked at these briefly and it seems like the problem here is that
the output of the C front end after gimplification is different than
it used to be, since I swapped in the C++ genericization code (it
lowers to a LOOP_EXPR instead of directly to goto form, then the
LOOP_EXPR gets lowered during gimplification).  TBH, I don't really
understand what optimizations these test cases are trying to test, and
whether the pattern-matching is too fragile, whether the form of the code
isn't something that actually tests the optimization, or whether the
optimization is just not working on this input.  So I'm not sure what
to do about those failures!  :-(  I could change the genericizer to
emit the same goto form that the C front end formerly emitted instead
of the C++-style output, but that might just be papering over bugs
elsewhere.  Any advice on how to proceed here is welcome.  If the
patches are OK otherwise, maybe just file bugzilla issues for these
regressions?

-Sandra


Sandra Loosemore (2):
   Move loop and switch tree data structures from cp/ to c-family/.
   Change C front end to emit structured loop and switch tree nodes.

  gcc/c-family/c-common.c |  24 +++
  gcc/c-family/c-common.def   |  24 +++
  gcc/c-family/c-common.h |  53 -
  gcc/c-family/c-dump.c   |  38 
  gcc/c-family/c-gimplify.c   | 414 
  gcc/c-family/c-pretty-print.c   |  92 +++-
  gcc/c/c-decl.c  |  18 +-
  gcc/c/c-lang.h  |   3 +-
  gcc/c/c-objc-common.h   |   2 +
  gcc/c/c-parser.c| 125 ++-
  gcc/c/c-tree.h  |  21 +-
  gcc/c/c-typeck.c| 227 ++--
  gcc/cp/cp-gimplify.c| 347 ++
  gcc/cp/cp-objcp-common.c|  13 +-
  gcc/cp/cp-tree.def  |  23 --
  gcc/cp/cp-tree.h|  40 
  gcc/cp/cxx-pretty-print.c   |  78 ---
  gcc/cp/dump.c   |  31 ---
  gcc/doc/generic.texi|  56 +++--
  gcc/objc/objc-act.c |   6 +-
  gcc/testsuite/gcc.dg/gomp/block-7.c |  12 +-
  21 files changed, 859 insertions(+), 788 deletions(-)





[PATCH 0/2] Make C front end share the C++ tree representation of loops and switches

2019-11-13 Thread Sandra Loosemore
This patch series lays some groundwork for the project to redo the
OpenACC "kernels" region support in GCC, described in Thomas
Schwinge's recent talk at the GNU Cauldron:

https://gcc.gnu.org/wiki/cauldron2019talks?action=AttachFile=view=OpenACC+kernels-cauldron2019.pdf

Briefly, the larger goal is to make the compiler recognize "for" loops
that are candidates for parallelization, and treat them as if they
were annotated with "#pragma acc loop auto".  This is fairly
straightforward to do on the output of the Fortran and C++ front ends,
which both produce a high-level parse tree representation of the loop.
OTOH, the C front end currently lowers loops into a goto form very
early, making it hard both to recognize "for" loops and to
differentiate between (valid) structured and (invalid) unstructured
control flow in the body of the loop.

So, the immediate goal of this patch sequence is to make the C front
end produce output using the same high-level tree representation as
the C++ front end already emits: specifically, to produce FOR_STMT,
WHILE_STMT, and DO_STMT for loops, BREAK_STMT and CONTINUE_STMT
instead of gotos, and also SWITCH_STMT instead of SWITCH_EXPR since
that's mixed up in the handling for "break".  Besides this being
helpful to OpenACC implementation, there's also some argument to be
made that sharing more code between the respective C family front ends
is good from an engineering perspective, etc.

Part 1 of the patch set moves the definitions and support
(genericization, pretty-printers, dumpers, etc) for those tree nodes
out of the cp/ front end and into the common c-family/ code.  Part 2
hacks up the C front end to emit the now-shared data structures.

I bootstrapped and regression-tested this on x86_64-linux-gnu.  There
are a few regressions involving these tests:

gcc.dg/tree-ssa/pr77445-2.c
gcc.dg/tree-ssa/ssa-dce-3.c
gcc.dg/tree-ssa/ssa-dom-thread-7.c

I looked at these briefly and it seems like the problem here is that
the output of the C front end after gimplification is different than
it used to be, since I swapped in the C++ genericization code (it
lowers to a LOOP_EXPR instead of directly to goto form, then the
LOOP_EXPR gets lowered during gimplification).  TBH, I don't really
understand what optimizations these test cases are trying to test, and
whether the pattern-matching is too fragile, whether the form of the code
isn't something that actually tests the optimization, or whether the
optimization is just not working on this input.  So I'm not sure what
to do about those failures!  :-(  I could change the genericizer to
emit the same goto form that the C front end formerly emitted instead
of the C++-style output, but that might just be papering over bugs
elsewhere.  Any advice on how to proceed here is welcome.  If the
patches are OK otherwise, maybe just file bugzilla issues for these
regressions?

-Sandra


Sandra Loosemore (2):
  Move loop and switch tree data structures from cp/ to c-family/.
  Change C front end to emit structured loop and switch tree nodes.

 gcc/c-family/c-common.c |  24 +++
 gcc/c-family/c-common.def   |  24 +++
 gcc/c-family/c-common.h |  53 -
 gcc/c-family/c-dump.c   |  38 
 gcc/c-family/c-gimplify.c   | 414 
 gcc/c-family/c-pretty-print.c   |  92 +++-
 gcc/c/c-decl.c  |  18 +-
 gcc/c/c-lang.h  |   3 +-
 gcc/c/c-objc-common.h   |   2 +
 gcc/c/c-parser.c| 125 ++-
 gcc/c/c-tree.h  |  21 +-
 gcc/c/c-typeck.c| 227 ++--
 gcc/cp/cp-gimplify.c| 347 ++
 gcc/cp/cp-objcp-common.c|  13 +-
 gcc/cp/cp-tree.def  |  23 --
 gcc/cp/cp-tree.h|  40 
 gcc/cp/cxx-pretty-print.c   |  78 ---
 gcc/cp/dump.c   |  31 ---
 gcc/doc/generic.texi|  56 +++--
 gcc/objc/objc-act.c |   6 +-
 gcc/testsuite/gcc.dg/gomp/block-7.c |  12 +-
 21 files changed, 859 insertions(+), 788 deletions(-)

-- 
2.8.1