On Tue, Jan 10, 2017 at 8:16 PM, Jeff Law <l...@redhat.com> wrote:
> On 01/04/2017 05:25 AM, Richard Biener wrote:
>>
>> On Wed, Jan 4, 2017 at 6:31 AM, Jeff Law <l...@redhat.com> wrote:
>>>
>>>
>>> So as noted in the BZ comments the jump threading code has code that
>>> detects
>>> when a jump threading path wants to cross multiple loop headers and
>>> truncates the jump threading path in that case.
>>>
>>> What we should have done instead is invalidate the cached loop
>>> information.
>>>
>>> Additionally, this BZ shows that just looking at loop headers is not
>>> sufficient -- we might cross from a reducible to an irreducible region
>>> which
>>> is equivalent to crossing into another loop in that we need to invalidate
>>> the cached loop iteration information.
>>>
>>> What's so damn funny here is that eventually we take nested loops and
>>> irreducible regions, thread various edges and end up with a nice natural
>>> loop and no irreducible regions in the end :-)  But the cached iteration
>>> information is still bogus.
>>>
>>> Anyway, this patch corrects both issues.  It treats moving between an
>>> reducible and irreducible region as crossing a loop header and it
>>> invalidates the cached iteration information rather than truncating the
>>> jump
>>> thread path.
>>>
>>> Bootstrapped and regression tested on x86_64-linux-gnu.  That compiler
>>> was
>>> also used to build all the configurations in config-list.mk.
>>>
>>> Installing on the trunk.  I could be convinced to install on the gcc-6
>>> branch as well since it's affected by the same problem.
>>>
>>> Jeff
>>>
>>>
>>> commit 93e3964a4664350446eefe786e3b73eb41d99036
>>> Author: law <law@138bc75d-0d04-0410-961f-82ee72b054a4>
>>> Date:   Wed Jan 4 05:31:23 2017 +0000
>>>
>>>         PR tree-optimizatin/78856
>>>         * tree-ssa-threadupdate.c: Include tree-vectorizer.h.
>>>         (mark_threaded_blocks): Remove code to truncate thread paths that
>>>         cross multiple loop headers.  Instead invalidate the cached loop
>>>         iteration information and handle case of a thread path walking
>>>         into an irreducible region.
>>>
>>>         PR tree-optimization/78856
>>>         * gcc.c-torture/execute/pr78856.c: New test.
>>>
>>>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@244045
>>> 138bc75d-0d04-0410-961f-82ee72b054a4
>>>
>>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>>> index 3114e02..6b2888f 100644
>>> --- a/gcc/ChangeLog
>>> +++ b/gcc/ChangeLog
>>> @@ -1,3 +1,12 @@
>>> +2017-01-03  Jeff Law  <l...@redhat.com>
>>> +
>>> +       PR tree-optimizatin/78856
>>> +       * tree-ssa-threadupdate.c: Include tree-vectorizer.h.
>>> +       (mark_threaded_blocks): Remove code to truncate thread paths that
>>> +       cross multiple loop headers.  Instead invalidate the cached loop
>>> +       iteration information and handle case of a thread path walking
>>> +       into an irreducible region.
>>> +
>>>  2016-12-30  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>>>
>>>         PR target/78900
>>> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
>>> index cd2a065..cadfbc9 100644
>>> --- a/gcc/testsuite/ChangeLog
>>> +++ b/gcc/testsuite/ChangeLog
>>> @@ -1,3 +1,8 @@
>>> +2017-01-03  Jeff Law  <l...@redhat.com>
>>> +
>>> +       PR tree-optimization/78856
>>> +       * gcc.c-torture/execute/pr78856.c: New test.
>>> +
>>>  2017-01-03  Michael Meissner  <meiss...@linux.vnet.ibm.com>
>>>
>>>         PR target/78953
>>> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>>> b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>>> new file mode 100644
>>> index 0000000..80f2317
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.c-torture/execute/pr78856.c
>>> @@ -0,0 +1,25 @@
>>> +extern void exit (int);
>>> +
>>> +int a, b, c, d, e, f[3];
>>> +
>>> +int main()
>>> +{
>>> +  while (d)
>>> +    while (1)
>>> +      ;
>>> +  int g = 0, h, i = 0;
>>> +  for (; g < 21; g += 9)
>>> +    {
>>> +      int j = 1;
>>> +      for (h = 0; h < 3; h++)
>>> +       f[h] = 1;
>>> +      for (; j < 10; j++) {
>>> +       d = i && (b ? 0 : c);
>>> +       i = 1;
>>> +       if (g)
>>> +         a = e;
>>> +      }
>>> +  }
>>> +  exit (0);
>>> +}
>>> +
>>> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
>>> index adbb6e0..2da93a8 100644
>>> --- a/gcc/tree-ssa-threadupdate.c
>>> +++ b/gcc/tree-ssa-threadupdate.c
>>> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "cfgloop.h"
>>>  #include "dbgcnt.h"
>>>  #include "tree-cfg.h"
>>> +#include "tree-vectorizer.h"
>>>
>>>  /* Given a block B, update the CFG and SSA graph to reflect redirecting
>>>     one or more in-edges to B to instead reach the destination of an
>>> @@ -2084,10 +2085,8 @@ mark_threaded_blocks (bitmap threaded_blocks)
>>>    /* Look for jump threading paths which cross multiple loop headers.
>>>
>>>       The code to thread through loop headers will change the CFG in ways
>>> -     that break assumptions made by the loop optimization code.
>>> -
>>> -     We don't want to blindly cancel the requests.  We can instead do
>>> better
>>> -     by trimming off the end of the jump thread path.  */
>>> +     that invalidate the cached loop iteration information.  So we must
>>> +     detect that case and wipe the cached information.  */
>>>    EXECUTE_IF_SET_IN_BITMAP (tmp, 0, i, bi)
>>>      {
>>>        basic_block bb = BASIC_BLOCK_FOR_FN (cfun, i);
>>> @@ -2102,26 +2101,16 @@ mark_threaded_blocks (bitmap threaded_blocks)
>>>                    i++)
>>>                 {
>>>                   basic_block dest = (*path)[i]->e->dest;
>>> +                 basic_block src = (*path)[i]->e->src;
>>>                   crossed_headers += (dest == dest->loop_father->header);
>>> +                 /* If we step from a block outside an irreducible
>>> region
>>> +                    to a block inside an irreducible region, then we
>>> have
>>> +                    crossed into a loop.  */
>>> +                 crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
>>> +                                     != (dest->flags &
>>> BB_IRREDUCIBLE_LOOP));
>>
>>
>> I'm not sure about this one -- if dest is inside an irreducible region
>> then
>> dest->loop_father will be the containing loop which may be unrelated
>> (just crossed).  Either we miss handling of crossed loop headers or
>> whether dest is inside an irreducible region doesn't matter.
>
> So imagine where we have an irreducible region that lives within a natural
> loop.
>
> We have a jump thread from outside the natural loop (and thus crosses the
> loop header) which targets a block inside the irreducible region.
>
> As we walk the jump threading path, we first set crossed_headers to one as
> we walk from outside the loop to the loop header.  We then increment it
> again as we walk into the irreducible region.
>
> It may be the case that this can be relaxed, but I'd rather be safe than
> sorry at this stage in the game.

Sure, I was pointing out inconsistencies.  Quoting the updated code:

              vec<jump_thread_edge *> *path = THREAD_PATH (e);

              for (unsigned int i = 0, crossed_headers = 0;
                   i < path->length ();
                   i++)
                {
                  basic_block dest = (*path)[i]->e->dest;
                  basic_block src = (*path)[i]->e->src;
                  crossed_headers += (dest == dest->loop_father->header);
                  /* If we step from a block outside an irreducible region
                     to a block inside an irreducible region, then we have
                     crossed into a loop.  */
                  crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
                                      != (dest->flags & BB_IRREDUCIBLE_LOOP));
                  if (crossed_headers > 1)
                    {
                      vect_free_loop_info_assumptions (dest->loop_father);
                      break;
                    }
                }

assuming THREAD_PATH covers all edges in a path then assume a thread path
threads from outside a two-level nested loop into the innermost loop then the
above code will correctly detect we cross the outer loop header and reset its
loop assumptions.  But it will fail to reset assumptions for the 2nd
header crossing
into the inner loop.  And as we can just reset assumptions for loops
(not irreducible
regions) I would have expected sth like

              for (unsigned int i = 0, crossed_headers = 0;
                   i < path->length ();
                   i++)
                {
                  basic_block dest = (*path)[i]->e->dest;
                  if (dest == dest->loop_father->header)
                    vect_free_loop_info_assumptions (dest->loop_father);
                }

or even walking the path backwards as I think only the path ultimate destination
loop father info needs to be scrapped(?).  So

              for (unsigned int i = 0, crossed_headers = 0;
                   i < path->length ();
                   i++)
                {
                  basic_block dest = (*path)[i]->e->dest;
                  basic_block src = (*path)[i]->e->src;
                  crossed_headers += (dest == dest->loop_father->header);
                  /* If we step from a block outside an irreducible region
                     to a block inside an irreducible region, then we have
                     crossed into a loop.  */
                  crossed_headers += ((src->flags & BB_IRREDUCIBLE_LOOP)
                                      != (dest->flags & BB_IRREDUCIBLE_LOOP));
                  if (crossed_headers > 1)
                    {
                      vect_free_loop_info_assumptions
                        ((*path)[path->length () - 1]->e->dest->loop_father);
                      break;
                    }
                }

or walking the path backwards?

Thanks,
Richard.

> Jeff

Reply via email to