[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

Richard Biener  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Richard Biener  ---
Fixed.

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-11 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #12 from Richard Biener  ---
Author: rguenth
Date: Thu Aug 11 09:02:04 2016
New Revision: 239357

URL: https://gcc.gnu.org/viewcvs?rev=239357=gcc=rev
Log:
2016-08-11  Richard Biener  

PR tree-optimization/72772
* cfgloopmanip.c (create_preheader): Use split_edge if there
is a single loop entry, avoiding degenerate PHIs.

* gcc.dg/graphite/pr35356-1.c: Adjust.
* gcc.dg/tree-ssa/pr59597.c: Likewise.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/cfgloopmanip.c
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/graphite/pr35356-1.c
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-09 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #11 from amker at gcc dot gnu.org ---
Author: amker
Date: Tue Aug  9 15:08:02 2016
New Revision: 239291

URL: https://gcc.gnu.org/viewcvs?rev=239291=gcc=rev
Log:
PR tree-optimization/72772
* tree-ssa-loop-niter.c (loop_exits_before_overflow): Check equality
for expanded base.

gcc/testsuite
PR tree-optimization/pr72772
* gcc.dg/tree-ssa/pr72772.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/tree-ssa/pr72772.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-ssa-loop-niter.c

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-09 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #10 from amker at gcc dot gnu.org ---
Author: amker
Date: Tue Aug  9 15:01:49 2016
New Revision: 239290

URL: https://gcc.gnu.org/viewcvs?rev=239290=gcc=rev
Log:
PR tree-optimization/72772
* tree-ssa-loop-niter.h (simplify_using_initial_conditions): Delete
parameter STOP.
* tree-ssa-loop-niter.c (tree_simplify_using_condition_1): Delete
parameter STOP and update calls.  Move expand_simple_operations
function call from here...
(simplify_using_initial_conditions): ...to here.  Delete parameter
STOP.
(tree_simplify_using_condition): Delete parameter STOP.
* tree-scalar-evolution.c (simple_iv_with_niters): Update call to
simplify_using_initial_conditions.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/tree-scalar-evolution.c
trunk/gcc/tree-ssa-loop-niter.c
trunk/gcc/tree-ssa-loop-niter.h

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #9 from Richard Biener  ---
FAIL: gcc.dg/tree-ssa/20030714-2.c scan-tree-dump-times dom2 "if " 3

is because jump-threading doesn't want to thread to a loop header and w/o the
patch we have a pre-header with the degenerate PHI which causes it to stop
there.  With the patch we have an empty pre-header which is threaded as well
and thus threading fails.  We hit

  /* One case occurs when there was loop header buried in a jump
 threading path that crosses loop boundaries.  We do not try
 and thread this elsewhere, so just cancel the jump threading
 request by clearing the AUX field now.  */
  if ((bb->loop_father != e2->src->loop_father
   && !loop_exit_edge_p (e2->src->loop_father, e2))
  || (e2->src->loop_father != e2->dest->loop_father
  && !loop_exit_edge_p (e2->src->loop_father, e2)))
{
  /* Since this case is not handled by our special code
 to thread through a loop header, we must explicitly
 cancel the threading request here.  */
  delete_jump_thread_path (path);
  e->aux = NULL;
  continue;
}

as e2 is the loop entry edge.

  Registering jump thread: (3, 5) incoming edge;  (5, 8) normal; (8, 6) nocopy;

and 8->6 is this edge while 8 is the loop preheader.  Not sure why we
have the above 2nd condition.  Not sure why, if we have this condition,
we do not simply drop the tail of this path.  It looks like
mark_threaded_blocks
would fix this up if the crossed_header check would be >= 1 and not just > 1.

Index: gcc/tree-ssa-threadupdate.c
===
--- gcc/tree-ssa-threadupdate.c (revision 239117)
+++ gcc/tree-ssa-threadupdate.c (working copy)
@@ -1531,10 +1531,8 @@ thread_block_1 (basic_block bb, bool nol
 threading path that crosses loop boundaries.  We do not try
 and thread this elsewhere, so just cancel the jump threading
 request by clearing the AUX field now.  */
- if ((bb->loop_father != e2->src->loop_father
-  && !loop_exit_edge_p (e2->src->loop_father, e2))
- || (e2->src->loop_father != e2->dest->loop_father
- && !loop_exit_edge_p (e2->src->loop_father, e2)))
+ if (bb->loop_father != e2->src->loop_father
+ && !loop_exit_edge_p (e2->src->loop_father, e2))
{
  /* Since this case is not handled by our special code
 to thread through a loop header, we must explicitly

fixes this testcase.

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #8 from Richard Biener  ---
A more light-weight fix for the specific issue would be

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 239117)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -344,7 +344,7 @@ tree_forwarder_block_p (basic_block bb,
 {
   basic_block dest;
   /* Protect loop headers.  */
-  if (bb->loop_father->header == bb)
+  if (bb_loop_header_p (bb))
return false;

   dest = EDGE_SUCC (bb, 0)->dest;

where bb_loop_header_p only relies on DOM info.  Of course other checks
are still bogus in the presence of new, undiscovered loops (and overly
cautionous in the case of removed loops not marked for removal).  This
applies to all loop related stuff done in CFG hooks as well (though if
LOOPS_NEED_FIXUP is set all the updates they do bogously or fail to do
are fixed by the required loop-fixup).  So in the end only CFG cleanup
routines guarding transforms require up-to-date loop info.  The above
is nearly the only one (with the following preheader/latch checks
falling into the same category as the CFG hook cases).  So the above
might be a complete fix at this point.  Testing that now.

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

Richard Biener  changed:

   What|Removed |Added

 CC||hubicka at gcc dot gnu.org

--- Comment #7 from Richard Biener  ---
Thus the following "fixes" it.  I'm really not sure at which level we should
try to fix this - ideally passes doing this kind of transform would reset
the upper bound (and niter info).  Unfortunately in this case it is really
CFG cleanup ... but the thread pass threading destination is a new header
and not yet identified as such (the thread pass fails to create the new loop
itself, leaving it to loop fixup - in this case CFG cleanup avoids this new
loop by mashing it with the existing one).  In fact tree_forwarder_block_p
guards against the transform (but it doesnt' work because BB is not yet
marked as loop header):

  if (current_loops)
{
  basic_block dest;
  /* Protect loop headers.  */
  if (bb->loop_father->header == bb)
return false;

CFG cleanup runs loop fixup after its job and I'm (not) sure we want to do it
at its start as well to catch this case.

Index: gcc/loop-init.c
===
--- gcc/loop-init.c (revision 239117)
+++ gcc/loop-init.c (working copy)
@@ -306,6 +311,8 @@ fix_loop_structure (bitmap changed_bbs)
flow_loop_free (loop);
any_deleted = true;
   }
+else if (loop)
+  loop->any_upper_bound = false;

   /* If we deleted loops then the cached scalar evolutions refering to
  those loops become invalid.  */

The alternative is to make sure loops are fixed up before CFG cleanup as
it guards transforms using loop info:

Index: gcc/tree-cfgcleanup.c
===
--- gcc/tree-cfgcleanup.c   (revision 239117)
+++ gcc/tree-cfgcleanup.c   (working copy)
@@ -731,59 +731,19 @@ cleanup_tree_cfg_1 (void)
 }


-/* Remove unreachable blocks and other miscellaneous clean up work.
-   Return true if the flowgraph was modified, false otherwise.  */
-
-static bool
-cleanup_tree_cfg_noloop (void)
-{
-  bool changed;
-
-  timevar_push (TV_TREE_CLEANUP_CFG);
-
-  /* Iterate until there are no more cleanups left to do.  If any
- iteration changed the flowgraph, set CHANGED to true.
-
- If dominance information is available, there cannot be any unreachable
- blocks.  */
-  if (!dom_info_available_p (CDI_DOMINATORS))
-{
-  changed = delete_unreachable_blocks ();
-  calculate_dominance_info (CDI_DOMINATORS);
-}
-  else
-{
-  checking_verify_dominators (CDI_DOMINATORS);
-  changed = false;
-}
-
-  changed |= cleanup_tree_cfg_1 ();
-
-  gcc_assert (dom_info_available_p (CDI_DOMINATORS));
-  compact_blocks ();
-
-  checking_verify_flow_info ();
-
-  timevar_pop (TV_TREE_CLEANUP_CFG);
-
-  if (changed && current_loops)
-loops_state_set (LOOPS_NEED_FIXUP);
-
-  return changed;
-}
-
 /* Repairs loop structures.  */

@@ -803,17 +764,57 @@ repair_loop_structures (void)
   timevar_pop (TV_REPAIR_LOOPS);
 }

-/* Cleanup cfg and repair loop structures.  */
+/* Cleanup the CFG by removing unreachable blocks and doing other
miscellaneous
+   clean up work.  Also repair loop structures if necessary.
+
+   Return true if the flowgraph was modified, false otherwise.  */

 bool
 cleanup_tree_cfg (void)
 {
-  bool changed = cleanup_tree_cfg_noloop ();
+  bool changed;
+
+  timevar_push (TV_TREE_CLEANUP_CFG);
+
+  /* Iterate until there are no more cleanups left to do.  If any
+ iteration changed the flowgraph, set CHANGED to true.
+
+ If dominance information is available, there cannot be any unreachable
+ blocks.  */
+  if (!dom_info_available_p (CDI_DOMINATORS))
+{
+  changed = delete_unreachable_blocks ();
+  calculate_dominance_info (CDI_DOMINATORS);
+}
+  else
+{
+  checking_verify_dominators (CDI_DOMINATORS);
+  changed = false;
+}

+  /* Repair loop structures as we rely on loops being correct with
+ guarding several of the transforms.  */
   if (current_loops != NULL
   && loops_state_satisfies_p (LOOPS_NEED_FIXUP))
 repair_loop_structures ();

+  changed |= cleanup_tree_cfg_1 ();
+
+  gcc_assert (dom_info_available_p (CDI_DOMINATORS));
+  compact_blocks ();
+
+  checking_verify_flow_info ();
+
+  timevar_pop (TV_TREE_CLEANUP_CFG);
+
+  if (changed && current_loops != NULL)
+{
+  /* We are not marking loops for fixup in all cases, thus force it
+ here.  */
+  loops_state_set (LOOPS_NEED_FIXUP);
+  repair_loop_structures ();
+}
+
   return changed;
 }


I'm leaning towards this one even though it is more expensive :/  (and we
should fix that last comment issue and re-visit the loop-closed SSA update
stuff we do in repair_loop_structures).

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-04 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #6 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #5)
> Ok, so we record a correct upper bound for a loop but then later thread1
> makes
> a mess of the loop structures, introducing multiple back-edges by mashing
> an outer loop of said loop into it, making the number of iteration upper
> bound obviously incorrect.
> 
> And it is CFG cleanup which introduces those multiple back-edges, mashing
> the two loops!  It does this by treating the outer loop header as
> "forwarder".
> 
> Now we could detect this situation in redirect_edge_and_branch or avoid this
> loop smashing (in the end loop init will disambiguate the latches again if
> requested, re-creating the outer loop).  I think we at least want such
> smashing late (more aggressive cleanup was introduced because otherwise
> we create worse initial RTL and assembler).
> 
> The issue is how to detect this reliably in redirect_edge_and_branch given
> it has to work when LOOPS_NEED_FIXUP is set ... we have to reset niter
> info and estimates / upper bound whenever we redirect a latch - of both
> the source and destination loop.  With LOOPS_NEED_FIXUP we only can rely
> on headers (well, not even that, and all other cfghook loop fixup stuff
> has a similar issue).  The other option is to simply wipe niter info whenever
> we fixup loops (loop fixup is unfortunately a global thing).
> 
> I believe I saw this issue on another PR as well.

OTOH, I noticed that (at least) several optimizers rely on bound information
passed along in compilation flow.  And it's hard to get accurate bound
information by niter analyzer itself, for example, vectorized loops generally
have parameterized start/end condition values.

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-04 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #5 from Richard Biener  ---
Ok, so we record a correct upper bound for a loop but then later thread1 makes
a mess of the loop structures, introducing multiple back-edges by mashing
an outer loop of said loop into it, making the number of iteration upper
bound obviously incorrect.

And it is CFG cleanup which introduces those multiple back-edges, mashing
the two loops!  It does this by treating the outer loop header as "forwarder".

Now we could detect this situation in redirect_edge_and_branch or avoid this
loop smashing (in the end loop init will disambiguate the latches again if
requested, re-creating the outer loop).  I think we at least want such
smashing late (more aggressive cleanup was introduced because otherwise
we create worse initial RTL and assembler).

The issue is how to detect this reliably in redirect_edge_and_branch given
it has to work when LOOPS_NEED_FIXUP is set ... we have to reset niter
info and estimates / upper bound whenever we redirect a latch - of both
the source and destination loop.  With LOOPS_NEED_FIXUP we only can rely
on headers (well, not even that, and all other cfghook loop fixup stuff
has a similar issue).  The other option is to simply wipe niter info whenever
we fixup loops (loop fixup is unfortunately a global thing).

I believe I saw this issue on another PR as well.

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

Richard Biener  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
Most of the fallout is doing more jump-threading.  It looks like threading was
also confused by the extra degenerate PHIs.

The two ICEs are selective scheduling messing up the loop structure somehow.

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-03 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #3 from Richard Biener  ---
The patch has quite some fallout, including ICEs and wrong-code :/

FAIL: c-c++-common/ubsan/pr71403-1.c   -O3 -g  execution test
FAIL: gcc.dg/graphite/pr35356-1.c scan-tree-dump graphite "if (P_9 >= P_10
\
\\\+ 1 && P_10 >= 0) {"
FAIL: gcc.dg/tree-ssa/20030714-2.c scan-tree-dump-times dom2 "if " 3
FAIL: gcc.dg/tree-ssa/ivopt_5.c scan-tree-dump-times ivopts "ivtmp.[0-9_]* =
PHI <[^0]" 0
FAIL: gcc.dg/tree-ssa/loadpre2.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre21.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/loadpre22.c scan-tree-dump-times pre "Eliminated: 1" 1
FAIL: gcc.dg/tree-ssa/pr21417.c scan-tree-dump-times thread4 "FSM jump thread"
1
FAIL: gcc.dg/tree-ssa/pr59597.c scan-tree-dump vrp1 "Cancelling"
FAIL: gcc.dg/tree-ssa/ssa-pre-23.c scan-tree-dump pre "Eliminated: 3"
FAIL: gcc.target/i386/pr60901.c (internal compiler error)
FAIL: gcc.target/i386/pr60901.c (test for excess errors)

and with -m32 additionally

FAIL: gcc.dg/pr45570.c (internal compiler error)
FAIL: gcc.dg/pr45570.c (test for excess errors)
FAIL: gcc.dg/pr70920-4.c scan-tree-dump forwprop1 "if (_[0-9]* == 0)"
FAIL: gcc.dg/graphite/scop-dsyr2k.c scan-tree-dump-times graphite "number of
SCoPs: 1" 1
FAIL: gcc.dg/graphite/scop-dsyrk.c scan-tree-dump-times graphite "number of
SCoPs: 1" 1
FAIL: gcc.dg/tree-ssa/pr22051-2.c scan-tree-dump-times optimized "r_. =
(int) q" 1
FAIL: gcc.dg/tree-ssa/pr71078-3.c scan-tree-dump forwprop1 "__builtin_copysign"


I fixed the PRE testcases sofar, need to investigate the above some more.

The pasted patch contained an error, the if (single_edge) condition has to be
if (nentry == 1).

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-02 Thread amker at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

--- Comment #2 from amker at gcc dot gnu.org ---
(In reply to Richard Biener from comment #1)
> Note that for some odd reason creating the preheader causes the PHI to appar.
> Fixing that would be appreciated - it's the make_forwarder_block CFG hook
> that causes this.  Not sure why create_preheader doesn't special-case the
> single_entry != NULL case by simply splitting the edge.
> 
Thanks.  I am also testing patch to simplify_using_initial_conditions and
loop_exits_before_oveflow.  I think they worth fixing too because patches are
kind like a simplification.  Anyway I will send for comment once passes
bootstrap

[Bug tree-optimization/72772] Missed SCEV after pass reordering@236440

2016-08-02 Thread rguenth at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72772

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2016-08-02
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Richard Biener  ---
Note that for some odd reason creating the preheader causes the PHI to appar.
Fixing that would be appreciated - it's the make_forwarder_block CFG hook
that causes this.  Not sure why create_preheader doesn't special-case the
single_entry != NULL case by simply splitting the edge.

The following fixes the testcase:

Index: cfgloopmanip.c
===
--- cfgloopmanip.c  (revision 238938)
+++ cfgloopmanip.c  (working copy)
@@ -1497,7 +1497,7 @@ has_preds_from_loop (basic_block block,
 basic_block
 create_preheader (struct loop *loop, int flags)
 {
-  edge e, fallthru;
+  edge e;
   basic_block dummy;
   int nentry = 0;
   bool irred = false;
@@ -1544,9 +1544,14 @@ create_preheader (struct loop *loop, int

   mfb_kj_edge = loop_latch_edge (loop);
   latch_edge_was_fallthru = (mfb_kj_edge->flags & EDGE_FALLTHRU) != 0;
-  fallthru = make_forwarder_block (loop->header, mfb_keep_just, NULL);
-  dummy = fallthru->src;
-  loop->header = fallthru->dest;
+  if (single_entry)
+dummy = split_edge (single_entry);
+  else
+{
+  edge fallthru = make_forwarder_block (loop->header, mfb_keep_just,
NULL);
+  dummy = fallthru->src;
+  loop->header = fallthru->dest;
+}

   /* Try to be clever in placing the newly created preheader.  The idea is to
  avoid breaking any "fallthruness" relationship between blocks.