On Tue, Nov 05, 2013 at 02:00:16PM -0800, Cong Hou wrote:
> > I'm also curious -- did this code show up in a particular benchmark, if so,
> > which one?
> 
> I didn't find this problem from any benchmark, but from another
> concern about loop upper bound estimation. Look at the following code:
> 
> int foo(unsigned int n, int r)
> {
>   int i;
>   if (n > 0)
>     if (n < 4)
>     {
>       do {
>          --n;
>          r *= 2;
>       } while (n > 0);
>     }
>   return r+n;
> }
> 
> 
> In order to get the upper bound of the loop in this function, GCC
> traverses conditions n<4 and n>0 separately and tries to get any
> useful information. But as those two conditions cannot be combined
> into one due to this issue (note that n>0 will be transformed into
> n!=0), when GCC sees n<4, it will consider the possibility that n may
> be equal to 0, in which case the upper bound is UINT_MAX. If those two
> conditions can be combined into one, which is n-1<=2, then we can get
> the correct upper bound of the loop.

I've looked at the above testcase to see why we aren't able to determine
the number of iterations upper bound properly here.

That doesn't mean your patch is useless, though I must say I'm far from
being convinced it is safe ATM and also it looks like quite ugly special
case (trying to undo a VRP optimization but only one single specific case).

The first problem is VRP, we end up with:
  <bb 5>:
  # n_1 = PHI <n_5(D)(4), n_7(6)>
  # r_3 = PHI <r_6(D)(4), r_8(6)>
  # RANGE [0, 4294967295]
  n_7 = n_1 + 4294967295;
  # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
  r_8 = r_3 * 2;
  if (n_7 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;
  
  <bb 6>:
  goto <bb 5>;
- missing range on n_1, extremely conservative range on n_7.  With the
attached patch we get instead:
  <bb 5>:
  # RANGE [1, 3] NONZERO 0x00000000000000003
  # n_1 = PHI <n_5(D)(4), n_7(6)>
  # r_3 = PHI <r_6(D)(4), r_8(6)>
  # RANGE [0, 2] NONZERO 0x00000000000000003
  n_7 = n_1 + 4294967295;
  # RANGE [-2147483648, 2147483647] NONZERO 0x000000000fffffffe
  r_8 = r_3 * 2;
  if (n_7 != 0)
    goto <bb 6>;
  else
    goto <bb 7>;

  <bb 6>:
  goto <bb 5>;

The issue is that we use live_on_edge to determine if it is desirable
to added ASSERT_EXPRs, but as we walk bbs in RPO order, we first enter
the loop through the bb with exit edge and thus live of the latch isn't
computed (and, generally the propagation for live ignores dfs back edges
anyway), and because in the above live_on_edge ((5)->(6), n_7) is false,
we don't add ASSERT_EXPR that n_7 is != 0 in the latch bb, so during
iteration we start with n_1 being assumed [1, 3] (that is range of the
assertion from the preceeding conditions on n_5(D)), but in the next
iteration widen it to [0, 3] because we think n_7 can be [0, 2] in the
PHI and thus end up with uselessly wide range because we think the
subtraction can wrap around.  This patch improves live_on_edge for
empty latches, by just looking at the PHIs on loop->header from the
latch -> header edge and noting which SSA_NAMEs are used there.

I had to add -fno-tree-vrp to 4 unroll_*.c tests, because they disable
various tree optimizations already and want to test unrolling of loops
iterating exactly twice, but with this VRP change VRP is smart enough
to replace the PHI argument on the i IV from
-  # i_13 = PHI <i_8(3), 0(2)>
+  # i_13 = PHI <1(3), 0(2)>
(the loop iterates exactly twice) and RTL unrolling doesn't do the
tested thing in that case.  But, this should affect only loops that roll
exactly twice and those realy should be unrolled already far before, so IMHO
it isn't something to try to optimize better at the RTL level.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2013-11-06  Jakub Jelinek  <ja...@redhat.com>

        * tree-vrp.c (live_on_edge): If e->dest is an empty latch
        of some loop, but live[e->dest->index] is not computed, look
        for SSA_NAMEs used in loop header PHIs from the latch edge.

        * gcc.dg/unroll_1.c: Add -fno-tree-vrp to dg-options.
        * gcc.dg/unroll_2.c: Likewise.
        * gcc.dg/unroll_3.c: Likewise.
        * gcc.dg/unroll_4.c: Likewise.

--- gcc/tree-vrp.c.jj   2013-11-06 08:48:01.000000000 +0100
+++ gcc/tree-vrp.c      2013-11-06 09:32:19.205104029 +0100
@@ -92,6 +92,42 @@ static sbitmap *live;
 static bool
 live_on_edge (edge e, tree name)
 {
+  if (live[e->dest->index] == NULL)
+    {
+      /* Handle edges to empty latch blocks.  If NAME appears
+        in loop header phis on edges from latch, return true
+        and remember those SSA_NAMEs.  */
+      basic_block bb = e->dest;
+      if (bb->loop_father
+         && bb->loop_father->latch == bb
+         && single_succ_p (bb)
+         && empty_block_p (bb))
+       {
+         gimple_stmt_iterator gsi;
+         edge e2 = single_succ_edge (bb);
+         for (gsi = gsi_start_phis (e2->dest);
+              !gsi_end_p (gsi);
+              gsi_next (&gsi))
+           {
+             gimple phi = gsi_stmt (gsi);
+             tree res = gimple_phi_result (phi), arg;
+
+             if (virtual_operand_p (res))
+               continue;
+             arg = PHI_ARG_DEF_FROM_EDGE (phi, e2);
+             if (TREE_CODE (arg) == SSA_NAME)
+               {
+                 if (live[e->dest->index] == NULL)
+                   {
+                     live[e->dest->index] = sbitmap_alloc (num_ssa_names);
+                     bitmap_clear (live[e->dest->index]);
+                   }
+                 bitmap_set_bit (live[e->dest->index],
+                                 SSA_NAME_VERSION (arg));
+               }
+           }
+       }
+    }
   return (live[e->dest->index]
          && bitmap_bit_p (live[e->dest->index], SSA_NAME_VERSION (name)));
 }
--- gcc/testsuite/gcc.dg/unroll_1.c.jj  2013-09-09 11:32:36.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_1.c     2013-11-06 17:10:52.900722932 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops 
-fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll=stderr -fno-peel-loops 
-fno-tree-vrp -fdisable-tree-cunroll -fdisable-tree-cunrolli 
-fenable-rtl-loop2_unroll" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_2.c.jj  2013-08-30 14:38:39.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_2.c     2013-11-06 17:10:30.751845504 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
-fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
-fenable-rtl-loop2_unroll" } */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp 
-fdisable-tree-cunroll=foo -fdisable-tree-cunrolli=foo 
-fenable-rtl-loop2_unroll" } */
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_3.c.jj  2013-08-30 14:38:39.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_3.c     2013-11-06 17:10:38.864800338 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
-fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } 
*/
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp 
-fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo" } 
*/
 
 unsigned a[100], b[100];
 inline void bar()
--- gcc/testsuite/gcc.dg/unroll_4.c.jj  2013-08-30 14:38:40.000000000 +0200
+++ gcc/testsuite/gcc.dg/unroll_4.c     2013-11-06 17:11:03.816665603 +0100
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops 
-fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" 
} */
+/* { dg-options "-O2 -fdump-rtl-loop2_unroll -fno-peel-loops -fno-tree-vrp 
-fdisable-tree-cunroll -fdisable-tree-cunrolli -fenable-rtl-loop2_unroll=foo2" 
} */
 
 unsigned a[100], b[100];
 inline void bar()


        Jakub

Reply via email to