On Sat, 8 Oct 2011, William J. Schmidt wrote:

> Greetings,
> 
> Here are the revised changes for the tree portions of the patch.  I've
> attempted to resolve all comments to date on those portions.  Per
> Steven's comment, I moved copy_ref_info into tree-ssa-address.c; let me
> know if there's a better place, or whether you'd prefer to leave it
> where it was.
> 
> I looked into changing the second reassoc pass to use a different
> pass_late_reassoc entry, but this impacted the test suite.  There are
> about 20 tests that rely on -fdump-tree-reassoc being associated with
> two dump files named reassoc1 and reassoc2.  Rather than change all
> these test cases for a temporary solution, I chose to use the deprecated
> first_pass_instance boolean to distinguish between the two passes.  I
> marked this as a Bad Thing and it will be removed once I have time to
> work on the straight-line strength reducer.
> 
> I looked into adding a test case with a negative offset, but was unable
> to come up with a construct that would have a negative offset on the
> base MEM_REF and still be recognized by this particular pattern matcher.
> In any case, the use of double_ints throughout should remove that
> concern.

Comments below.

> Thanks,
> Bill
> 
> 
> 2011-10-08  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> 
> gcc:
> 
>       PR rtl-optimization/46556
>       * tree.h (copy_ref_info): Expose existing function.
>       * tree-ssa-loop-ivopts.c (copy_ref_info): Move this function to...
>       * tree-ssa-address.c (copy_ref_info): ...here, and remove static token.
>       * tree-ssa-reassoc.c (restructure_base_and_offset): New function.
>       (restructure_mem_ref): Likewise.
>       (reassociate_bb): Look for opportunities to call restructure_mem_ref.
> 
> gcc/testsuite:
> 
>       PR rtl-optimization/46556
>       * gcc.dg/tree-ssa/pr46556-1.c: New testcase.
>       * gcc.dg/tree-ssa/pr46556-2.c: Likewise.
>       * gcc.dg/tree-ssa/pr46556-3.c: Likewise.
> 
> 
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h        (revision 179708)
> +++ gcc/tree.h        (working copy)
> @@ -5777,6 +5777,7 @@ tree target_for_debug_bind (tree);
>  /* In tree-ssa-address.c.  */
>  extern tree tree_mem_ref_addr (tree, tree);
>  extern void copy_mem_ref_info (tree, tree);
> +extern void copy_ref_info (tree, tree);
>  
>  /* In tree-vrp.c */
>  extern bool ssa_name_nonnegative_p (const_tree);
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c (revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "dom2" } 
> } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c (revision 0)
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 12)
> +    foo (p->a[n], p->c[n], p->b[n]);
> +  else if (n > 3)
> +    foo (p->b[n], p->a[n], p->c[n]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } 
> } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c (revision 0)
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 3)
> +    {
> +      foo (p->a[n], p->c[n], p->b[n]);
> +      if (n > 12)
> +     foo (p->b[n], p->a[n], p->c[n]);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } 
> } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c        (revision 179708)
> +++ gcc/tree-ssa-loop-ivopts.c        (working copy)
> @@ -6278,64 +6278,6 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da
>      }
>  }
>  
> -/* Copies the reference information from OLD_REF to NEW_REF.  */
> -
> -static void
> -copy_ref_info (tree new_ref, tree old_ref)
> -{
> -  tree new_ptr_base = NULL_TREE;
> -
> -  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
> -  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);
> -
> -  new_ptr_base = TREE_OPERAND (new_ref, 0);
> -
> -  /* We can transfer points-to information from an old pointer
> -     or decl base to the new one.  */
> -  if (new_ptr_base
> -      && TREE_CODE (new_ptr_base) == SSA_NAME
> -      && !SSA_NAME_PTR_INFO (new_ptr_base))
> -    {
> -      tree base = get_base_address (old_ref);
> -      if (!base)
> -     ;
> -      else if ((TREE_CODE (base) == MEM_REF
> -             || TREE_CODE (base) == TARGET_MEM_REF)
> -            && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> -            && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
> -     {
> -       struct ptr_info_def *new_pi;
> -       duplicate_ssa_name_ptr_info
> -         (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
> -       new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
> -       /* We have to be careful about transfering alignment information.  */
> -       if (TREE_CODE (old_ref) == MEM_REF
> -           && !(TREE_CODE (new_ref) == TARGET_MEM_REF
> -                && (TMR_INDEX2 (new_ref)
> -                    || (TMR_STEP (new_ref)
> -                        && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
> -                            < new_pi->align)))))
> -         {
> -           new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
> -                                               mem_ref_offset (new_ref)).low;
> -           new_pi->misalign &= (new_pi->align - 1);
> -         }
> -       else
> -         {
> -           new_pi->align = 1;
> -           new_pi->misalign = 0;
> -         }
> -     }
> -      else if (TREE_CODE (base) == VAR_DECL
> -            || TREE_CODE (base) == PARM_DECL
> -            || TREE_CODE (base) == RESULT_DECL)
> -     {
> -       struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
> -       pt_solution_set_var (&pi->pt, base);
> -     }
> -    }
> -}
> -
>  /* Performs a peephole optimization to reorder the iv update statement with
>     a mem ref to enable instruction combining in later phases. The mem ref 
> uses
>     the iv value before the update, so the reordering transformation requires
> Index: gcc/tree-ssa-address.c
> ===================================================================
> --- gcc/tree-ssa-address.c    (revision 179708)
> +++ gcc/tree-ssa-address.c    (working copy)
> @@ -832,6 +832,64 @@ copy_mem_ref_info (tree to, tree from)
>    TREE_THIS_VOLATILE (to) = TREE_THIS_VOLATILE (from);
>  }
>  
> +/* Copies the reference information from OLD_REF to NEW_REF.  */

Please add something like "NEW_REF is supposed to be either a MEM_REF
or a TARGET_MEM_REF.".  You can checkin the patch moving this
function separately.

> +
> +void
> +copy_ref_info (tree new_ref, tree old_ref)
> +{
> +  tree new_ptr_base = NULL_TREE;
> +
> +  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
> +  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);

this function misses to transfer TREE_THIS_NOTRAP which is supposed
to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
If you make the function generic please adjust it to at least do ...

> +  new_ptr_base = TREE_OPERAND (new_ref, 0);
> +
> +  /* We can transfer points-to information from an old pointer
> +     or decl base to the new one.  */
> +  if (new_ptr_base
> +      && TREE_CODE (new_ptr_base) == SSA_NAME
> +      && !SSA_NAME_PTR_INFO (new_ptr_base))
> +    {
> +      tree base = get_base_address (old_ref);
> +      if (!base)
> +     ;
> +      else if ((TREE_CODE (base) == MEM_REF
> +             || TREE_CODE (base) == TARGET_MEM_REF)
> +            && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> +            && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
> +     {
> +       struct ptr_info_def *new_pi;
> +       duplicate_ssa_name_ptr_info
> +         (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
> +       new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
> +       /* We have to be careful about transfering alignment information.  */
> +       if (TREE_CODE (old_ref) == MEM_REF
> +           && !(TREE_CODE (new_ref) == TARGET_MEM_REF
> +                && (TMR_INDEX2 (new_ref)
> +                    || (TMR_STEP (new_ref)
> +                        && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
> +                            < new_pi->align)))))
> +         {
> +           new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
> +                                               mem_ref_offset (new_ref)).low;
> +           new_pi->misalign &= (new_pi->align - 1);
> +         }
> +       else
> +         {
> +           new_pi->align = 1;
> +           new_pi->misalign = 0;
> +         }

...

          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);

> +     }
> +      else if (TREE_CODE (base) == VAR_DECL
> +            || TREE_CODE (base) == PARM_DECL
> +            || TREE_CODE (base) == RESULT_DECL)
> +     {
> +       struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
> +       pt_solution_set_var (&pi->pt, base);
> +     }
> +    }
> +}
> +
>  /* Move constants in target_mem_ref REF to offset.  Returns the new target
>     mem ref if anything changes, NULL_TREE otherwise.  */
>  
> Index: gcc/tree-ssa-reassoc.c
> ===================================================================
> --- gcc/tree-ssa-reassoc.c    (revision 179708)
> +++ gcc/tree-ssa-reassoc.c    (working copy)
> @@ -2815,6 +2815,121 @@ break_up_subtract_bb (basic_block bb)
>      break_up_subtract_bb (son);
>  }
>  
> +/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI,
> +   determine whether there is a profitable opportunity to restructure
> +   address arithmetic within BASE and OFFSET.  If so, produce such
> +   a restructuring and return it.  */
> +
> +static tree
> +restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi,

You are not changing *expr, so don't pass it as reference.

> +                          tree base, tree offset, HOST_WIDE_INT bitpos)
> +{
> +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> +  double_int c, c1, c2, c3, c4;
> +
> +  /* Look for the following pattern:
> +
> +       base   = MEM_REF (T1, C1);
> +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> +       bitpos = C4 * BITS_PER_UNIT
> +
> +     If found, convert it to:
> +
> +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> +                C1 + (C2 * C3) + C4)
> +   */
> +  if (!base
> +      || !offset
> +      || TREE_CODE (base) != MEM_REF
> +      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST

operand 1 of a MEM_REF is always an INTEGER_CST.

> +      || TREE_CODE (offset) != MULT_EXPR)
> +    return NULL_TREE;
> +
> +  mult_op0 = TREE_OPERAND (offset, 0);
> +  mult_op1 = TREE_OPERAND (offset, 1);
> +
> +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> +      || TREE_CODE (mult_op1) != INTEGER_CST
> +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  t1 = TREE_OPERAND (base, 0);
> +  t2 = TREE_OPERAND (mult_op0, 0);
> +  c1 = TREE_INT_CST (TREE_OPERAND (base, 1));

mem_ref_offset (base)

> +  c2 = TREE_INT_CST (TREE_OPERAND (mult_op0, 1));
> +  c3 = TREE_INT_CST (mult_op1);

tree_to_double_int ()

> +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);

You don't verify that bitpos % BITS_PER_UNIT is zero anywhere.

> +  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
> +
> +  if (!double_int_fits_in_shwi_p (c)
> +      || !double_int_fits_in_shwi_p (c3))
> +    return NULL_TREE;

I think those tests are not necessary if you use ...

> +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> +
> +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
> +                        build_int_cst (sizetype, double_int_to_shwi (c3)));

double_int_to_tree (sizetype, c3)

> +  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
> +                                     true, GSI_SAME_STMT);
> +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> +  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
> +                                    true, GSI_SAME_STMT);
> +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
> +                      build_int_cst (offset_type, double_int_to_shwi (c)));

double_int_to_tree (offset_type, c)

Please delay gimplification to the caller, that way this function
solely operates on the trees returned from get_inner_reference.
Or are you concerned that fold might undo your association?

> +  return mem_ref;
> +}
> +
> +/* If *EXPR contains a memory reference, determine whether there is an
> +   opportunity to restructure the base and offset expressions for
> +   improved performance.  */
> +
> +static bool
> +restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi)
> +{
> +  enum tree_code code = TREE_CODE (*expr);
> +  tree base, offset, mem_ref;
> +  HOST_WIDE_INT bitsize, bitpos;
> +  enum machine_mode mode;
> +  int unsignedp, volatilep;
> +
> +  /* Only expressions that reference memory are of interest.  Bitfield
> +     references should eventually be lowered prior to this point and
> +     are not dealt with.  Skip BLKmode aggregates as well.  */
> +  if (! handled_component_p (*expr)
> +      || code == BIT_FIELD_REF
> +      || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1)))
> +      || TYPE_MODE (TREE_TYPE (*expr)) == BLKmode)
> +    return false;
> +
> +  /* Determine whether the expression can be represented with base and
> +     offset components.  */
> +  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
> +                           &unsignedp, &volatilep, false);
> +  if (!base || !offset)
> +    return false;
> +
> +  /* Look for a restructuring opportunity.  */
> +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
> +                                           offset, bitpos)) == NULL_TREE)
> +    return false;
> +
> +  /* Found one.  Record the replacement in the dump file and complete
> +     the replacement.  */
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "\nOriginal_expr = ");
> +      print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS);
> +      fprintf (dump_file, "\nmem_ref = ");
> +      print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS);
> +      fprintf (dump_file, "\n\n");
> +    }

Thus gimplify and add the statements here.

> +  copy_ref_info (mem_ref, *expr);
> +  *expr = mem_ref;
> +
> +  return true;
> +}
> +
>  /* Reassociate expressions in basic block BB and its post-dominator as
>     children.  */
>  
> @@ -2823,14 +2938,43 @@ reassociate_bb (basic_block bb)
>  {
>    gimple_stmt_iterator gsi;
>    basic_block son;
> +  bool in_loop = loop_depth (bb->loop_father) != 0;
>  
>    for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
>      {
>        gimple stmt = gsi_stmt (gsi);
>  
> -      if (is_gimple_assign (stmt)
> -       && !stmt_could_throw_p (stmt))
> +      /* During late reassociation only, look for restructuring
> +      opportunities within an expression that references memory.
> +      We only do this for blocks not contained in loops, since the
> +      ivopts machinery does a good job on loop expressions, and we
> +      don't want to interfere with other loop optimizations.  */

I'm not sure I buy this.  IVOPTs would have produced [TARGET_]MEM_REFs
which you don't handle.  Did you do any measurements what happens if
you enable it generally?

> +      /* TODO: This belongs more properly in a separate pass that
> +      performs general strength reduction on straight-line code.
> +      Eventually move this there.  */
> +      if (!first_pass_instance /* TODO: deprecated  */
> +       && !in_loop
> +       && gimple_vuse (stmt)
> +       && gimple_assign_single_p (stmt))
>       {
> +       tree *lhs, *rhs;
> +       if (gimple_vdef (stmt))
> +         {
> +           lhs = gimple_assign_lhs_ptr (stmt);
> +           if (restructure_mem_ref (lhs, &gsi))
> +             update_stmt (stmt);
> +         }
> +       else if (gimple_vuse (stmt))

That will be always true.


> +         {
> +           rhs = gimple_assign_rhs1_ptr (stmt);
> +           if (restructure_mem_ref (rhs, &gsi))
> +             update_stmt (stmt);
> +         }
> +     }
> +
> +      else if (is_gimple_assign (stmt)
> +            && !stmt_could_throw_p (stmt))
> +     {
>         tree lhs, rhs1, rhs2;
>         enum tree_code rhs_code = gimple_assign_rhs_code (stmt);

You verified the patch has no performance degradations on ppc64
for SPEC CPU, did you see any improvements?

The pattern matching is still very ad-hoc and doesn't consider
statements that feed the base address.  There is conceptually
no difference between p->a[n] and *(p + n * 4).  You placed this
lowering in reassoc to catch CSE opportunities with DOM, right?
Does RTL CSE not do it's job or is the transform undone by
fwprop before it gets a chance to do it?

Thanks,
Richard.

Reply via email to