https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100934

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org
             Status|NEW                         |ASSIGNED

--- Comment #8 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Jeffrey A. Law from comment #7)
> So when we're finding jump threads we know if we thread through the loop
> latch and we note when that's going to create an irreducible region.   We
> generally suppress threading through the latch before the loop optimizers
> have run, but allow it afterwards.
> 
> But I'm not aware of a really good place to adjust the loop bound estimates,
> particularly for the backwards threader.  THe backwards threader uses
> copy_bbs API, so much of the guts of what's happening is opaque.
> 
> Peek at jump_thread_path_registry:::duplicate_thread_path.  All the
> backwards threader bits go through there at some point.

OK, so it looks like mark_threaded_blocks already has code to deal with this:

                  if (crossed_headers > 1)
                    {
                      vect_free_loop_info_assumptions
                        ((*path)[path->length () - 1]->e->dest->loop_father);
                      break;

but mark_threaded_blocks is only called after we process all FSM paths though
it's handling of the case wouldn't necessarily fix this instance where we
just duplicate the loop header incoming its backedge and to one of its
destination still in the same loop.

It looks like applying the FSM threads goes through different code paths
than the regular threading...

But the issue at hand must be more subtle since invalidating the number of
iterations on the loop that is then removed of course doesn't change
anything.  Still it is likely going to be an issue.  Possible fix for that
but not this testcase:

diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
index a86302be18e..4e31fa8f5c2 100644
--- a/gcc/tree-ssa-threadupdate.c
+++ b/gcc/tree-ssa-threadupdate.c
@@ -2379,12 +2379,15 @@ jump_thread_path_registry::duplicate_thread_path (edge
entry,
      missuses of the functions.  I.e. if you ask to copy something weird,
      it will work, but the state of structures probably will not be
      correct.  */
+  bool loop_header_duplicated = false;
   for (i = 0; i < n_region; i++)
     {
       /* We do not handle subloops, i.e. all the blocks must belong to the
         same loop.  */
       if (region[i]->loop_father != loop)
        return false;
+      if (region[i]->loop_father == loop)
+       loop_header_duplicated = true;
     }

   initialize_original_copy_tables ();
@@ -2501,6 +2504,11 @@ jump_thread_path_registry::duplicate_thread_path (edge
entry,
   /* Add the other PHI node arguments.  */
   add_phi_args_after_copy (region_copy, n_region, NULL);

+  /* Invalidate loop niter information if the loop header was part of the
+     thread path.  */
+  if (loop_header_duplicated)
+    vect_free_loop_info_assumptions (loop);
+
   free (region_copy);

   adjust_paths_after_duplication (current_path_no);


The actual issue seems to be that the DOM pass following the problematic
threading threads additional paths through loop headers which ends up
rotating the loop and also affecting niter estimates since it crosses
a irreducible region boundary.  Ah - but we have _not_ marked irreducible
regions but we are querying them from DOM threading!

diff --git a/gcc/tree-ssa-dom.c b/gcc/tree-ssa-dom.c
index 075b1ccb9de..c231e6c8467 100644
--- a/gcc/tree-ssa-dom.c
+++ b/gcc/tree-ssa-dom.c
@@ -727,7 +727,8 @@ pass_dominator::execute (function *fun)
      gcc.dg/tree-ssa/pr21417.c can't be threaded if loop preheader is
      missing.  We should improve jump threading in future then
      LOOPS_HAVE_PREHEADERS won't be needed here.  */
-  loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES);
+  loop_optimizer_init (LOOPS_HAVE_PREHEADERS | LOOPS_HAVE_SIMPLE_LATCHES
+                      | LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS);

   /* We need accurate information regarding back edges in the CFG
      for jump threading; this may include back edges that are not part of

fixes the testcase.

Reply via email to