On Mon, 2013-09-02 at 11:15 +0200, Richard Biener wrote:
> On Mon, Sep 2, 2013 at 8:56 AM, bin.cheng <bin.ch...@arm.com> wrote:
> > Hi,
> >
> > The gimple-ssa-strength-reduction pass handles CAND_REFs in order to find
> > different MEM_REFs sharing common part in addressing expression.  If such
> > MEM_REFs are found, the pass rewrites MEM_REFs, and produces more efficient
> > addressing expression during the RTL passes.
> > The pass analyzes addressing expression in each MEM_REF to see if it can be
> > formalized as follows:
> >      base:    MEM_REF (T1, C1)
> >      offset:  MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> >      bitpos:  C4 * BITS_PER_UNIT
> > Then restructures it into below form:
> >      MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> >               C1 + (C2 * C3) + C4)
> > At last, rewrite the MEM_REFs if there are two or more sharing common
> > (non-constant) part.
> > The problem is it doesn't back trace T2.  If T2 is recorded as a CAND_ADD in
> > form of "T2' + C5", the MEM_REF should be restructure into:
> >      MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2', C3)),
> >               C1 + (C2 * C3) + C4 + (C5 * C3))
> >
> > The patch also includes a test case to illustrate the problem.
> >
> > Bootstrapped and tested on x86/x86_64/arm-a15, is it ok?
> 
> This looks ok to me if Bill is ok with it.

This is a good generalization and I'm fine with it.  There are a few
minor nits that should be corrected, outlined below.

> 
> Thanks,
> Richard.
> 
> > Thanks.
> > bin
> >
> > 2013-09-02  Bin Cheng  <bin.ch...@arm.com>
> >
> >         * gimple-ssa-strength-reduction.c (backtrace_base_for_ref): New.
> >         (restructure_reference): Call backtrace_base_for_ref.
> >
> > gcc/testsuite/ChangeLog
> > 2013-09-02  Bin Cheng  <bin.ch...@arm.com>
> >
> >         * gcc.dg/tree-ssa/slsr-39.c: New test.
> 

>>Index: gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c
>>===================================================================
>>--- >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>(revision 0)
>>+++ >>gcc/testsuite/gcc.dg/tree-ssa/slsr-39.c >>(revision 0)
>>@@ -0,0 +1,26 @@
>>+/* Verify straight-line strength reduction for back-tracing
>>+   CADN_ADD for T2 in:

CAND_ADD

>>+
>>+    *PBASE:    T1
>>+    *POFFSET:  MULT_EXPR (T2, C3)
>>+    *PINDEX:   C1 + (C2 * C3) + C4  */
>>+
>>+/* { dg-do compile } */
>>+/* { dg-options "-O2 -fdump-tree-slsr" } */
>>+
>>+typedef int arr_2[50][50];
>>+
>>+void foo (arr_2 a2, int v1)
>>+{
>>+  int i, j;
>>+
>>+  i = v1 + 5;
>>+  j = i;
>>+  a2 [i] [j++] = i;
>>+  a2 [i] [j++] = i;
>>+  a2 [i] [i-1] += 1;
>>+  return;
>>+}
>>+
>>+/* { dg-final { scan-tree-dump-times "MEM" 4 "slsr" } } */
>>+/* { dg-final { cleanup-tree-dump "slsr" } } */
>>Index: gcc/gimple-ssa-strength-reduction.c
>>===================================================================
>>--- >>gcc/gimple-ssa-strength-reduction.c     >>(revision 202067)
>>+++ >>gcc/gimple-ssa-strength-reduction.c     >>(working copy)
>>@@ -750,6 +750,57 @@ slsr_process_phi (gimple phi, bool speed)
>>   add_cand_for_stmt (phi, c);
>> }
>> 
>>+/* Given PBASE which is a pointer to tree, loop up the defining

look up

>>+   statement for it and check whether the candidate is in the
>>+   form of:
>>+
>>+     X = B + (1 * S), S is integer constant
>>+     X = B + (i * S), S is integer one
>>+
>>+   If so, set PBASE to the candiate's base_expr and return double

candidate's

>>+   int (i * S).
>>+   Otherwise, just return double int zero.  */

This is sufficient, since you are properly checking the next_interp
chain.  Another possible form would be

    X = (B + i) * 1,

but if this is present, then one of the forms you're checking for should
also be present, so there's no need to check the MULT_CANDs.

>>+
>>+static double_int
>>+backtrace_base_for_ref (tree *pbase)
>>+{
>>+  tree base_in = *pbase;
>>+  slsr_cand_t base_cand;
>>+
>>+  STRIP_NOPS (base_in);
>>+  if (TREE_CODE (base_in) != SSA_NAME)
>>+    return tree_to_double_int (integer_zero_node);
>>+
>>+  base_cand = base_cand_from_table (base_in);
>>+
>>+  while (base_cand && base_cand->kind != CAND_PHI)
>>+    {
>>+      if (base_cand->kind == CAND_ADD
>>+       && base_cand->index.is_one ()
>>+       && TREE_CODE (base_cand->stride) == INTEGER_CST)
>>+     {
>>+       /* X = B + (1 * S), S is integer constant.  */
>>+       *pbase = base_cand->base_expr;
>>+       return tree_to_double_int (base_cand->stride);
>>+     }
>>+      else if (base_cand->kind == CAND_ADD
>>+            && TREE_CODE (base_cand->stride) == INTEGER_CST
>>+            && integer_onep (base_cand->stride))
>>+        {
>>+       /* X = B + (i * S), S is integer one.  */
>>+       *pbase = base_cand->base_expr;
>>+       return base_cand->index;
>>+     }
>>+
>>+      if (base_cand->next_interp)
>>+     base_cand = lookup_cand (base_cand->next_interp);
>>+      else
>>+     base_cand = NULL;
>>+    }
>>+
>>+  return tree_to_double_int (integer_zero_node);
>>+}
>>+
>> /* Look for the following pattern:
>> 
>>     *PBASE:    MEM_REF (T1, C1)
>>@@ -767,8 +818,15 @@ slsr_process_phi (gimple phi, bool speed)
>> 
>>     *PBASE:    T1
>>     *POFFSET:  MULT_EXPR (T2, C3)
>>-    *PINDEX:   C1 + (C2 * C3) + C4  */
>>+    *PINDEX:   C1 + (C2 * C3) + C4
>> 
>>+   When T2 is recorded by an CAND_ADD in the form of (T2' + C5), It
                             ^                                      ^
                             a                                      it

>>+   will be further restructured to:
>>+
>>+    *PBASE:    T1
>>+    *POFFSET:  MULT_EXPR (T2', C3)
>>+    *PINDEX:   C1 + (C2 * C3) + C4 + (C5 * C3)  */
>>+
>> static bool
>> restructure_reference (tree *pbase, tree *poffset, double_int
*pindex,
>>                     tree *ptype)
>>@@ -777,7 +835,7 @@ restructure_reference (tree *pbase, tree *poffset,
>>   double_int index = *pindex;
>>   double_int bpu = double_int::from_uhwi (BITS_PER_UNIT);
>>   tree mult_op0, mult_op1, t1, t2, type;
>>-  double_int c1, c2, c3, c4;
>>+  double_int c1, c2, c3, c4, c5;
>> 
>>   if (!base
>>       || !offset
>>@@ -823,11 +881,12 @@ restructure_reference (tree *pbase, tree
*poffset,
>>     }
>> 
>>   c4 = index.udiv (bpu, FLOOR_DIV_EXPR);
>>+  c5 = backtrace_base_for_ref (&t2);
>> 
>>   *pbase = t1;
>>-  *poffset = fold_build2 (MULT_EXPR, sizetype, t2,
>>-                       double_int_to_tree (sizetype, c3));
>>-  *pindex = c1 + c2 * c3 + c4;
>>+  *poffset = size_binop (MULT_EXPR, fold_convert (sizetype, t2),
>>+                      double_int_to_tree (sizetype, c3));

I am not sure why you changed this call.  fold_build2 is a more
efficient call than size_binop.  size_binop makes several checks that
will fail in this case, and then calls fold_build2_loc, right?  Not a
big deal but seems like changing it back would be better.  Perhaps I'm
missing something (as usual ;).

Thanks,
Bill

>>+  *pindex = c1 + c2 * c3 + c4 + c5 * c3;
>>   *ptype = type;
>> 
>>   return true;


Reply via email to