Re: [PATCH] Fix PR44214

William J. Schmidt Fri, 20 Apr 2012 03:11:50 -0700

On Fri, 2012-04-20 at 10:04 +0200, Richard Guenther wrote:
> On Thu, 19 Apr 2012, William J. Schmidt wrote:
> 
> > This enhances constant folding for division by complex and vector
> > constants.  When -freciprocal-math is present, such divisions are
> > converted into multiplies by the constant reciprocal.  When an exact
> > reciprocal is available, this is done for vector constants when
> > optimizing.  I did not implement logic for exact reciprocals of complex
> > constants because either (a) the complexity doesn't justify the
> > likelihood of occurrence, or (b) I'm lazy.  Your choice. ;)
> > 
> > Bootstrapped with no new regressions on powerpc64-unknown-linux-gnu.  Ok
> > for trunk?
> 
> See below ...
> 
> > Thanks,
> > Bill
> > 
> > 
> > gcc:
> > 
> > 2012-04-19  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> > 
> >     PR rtl-optimization/44214
> >     * fold-const.c (exact_inverse): New function.
> >     (fold_binary_loc): Fold vector and complex division by constant into
> >     multiply by recripocal with flag_reciprocal_math; fold vector division
> >     by constant into multiply by reciprocal with exact inverse.
> > 
> > gcc/testsuite:
> > 
> > 2012-04-19  Bill Schmidt  <wschm...@linux.vnet.ibm.com>
> > 
> >     PR rtl-optimization/44214
> >     * gcc.target/powerpc/pr44214-1.c: New test.
> >     * gcc.dg/pr44214-2.c: Likewise.
> >     * gcc.target/powerpc/pr44214-3.c: Likewise.
> > 
> > 
> > Index: gcc/fold-const.c
> > ===================================================================
> > --- gcc/fold-const.c        (revision 186573)
> > +++ gcc/fold-const.c        (working copy)
> > @@ -9693,6 +9693,48 @@ fold_addr_of_array_ref_difference (location_t loc,
> >    return NULL_TREE;
> >  }
> >  
> > +/* If the real or vector real constant CST of type TYPE has an exact
> > +   inverse, return it, else return NULL.  */
> > +
> > +static tree
> > +exact_inverse (tree type, tree cst)
> > +{
> > +  REAL_VALUE_TYPE r;
> > +  tree unit_type, *elts;
> > +  enum machine_mode mode;
> > +  unsigned vec_nelts, i;
> > +
> > +  switch (TREE_CODE (cst))
> > +    {
> > +    case REAL_CST:
> > +      r = TREE_REAL_CST (cst);
> > +
> > +      if (exact_real_inverse (TYPE_MODE (type), &r))
> > +   return build_real (type, r);
> > +
> > +      return NULL_TREE;
> > +
> > +    case VECTOR_CST:
> > +      vec_nelts = VECTOR_CST_NELTS (cst);
> > +      elts = XALLOCAVEC (tree, vec_nelts);
> > +      unit_type = TREE_TYPE (type);
> > +      mode = TYPE_MODE (unit_type);
> > +
> > +      for (i = 0; i < vec_nelts; i++)
> > +   {
> > +     r = TREE_REAL_CST (VECTOR_CST_ELT (cst, i));
> > +     if (!exact_real_inverse (mode, &r))
> > +       return NULL_TREE;
> > +     elts[i] = build_real (unit_type, r);
> > +   }
> > +
> > +      return build_vector (type, elts);
> > +
> > +    default:
> > +      return NULL_TREE;
> > +    }
> > +}
> > +
> >  /* Fold a binary expression of code CODE and type TYPE with operands
> >     OP0 and OP1.  LOC is the location of the resulting expression.
> >     Return the folded expression if folding is successful.  Otherwise,
> > @@ -11734,23 +11776,25 @@ fold_binary_loc (location_t loc,
> >      so only do this if -freciprocal-math.  We can actually
> >      always safely do it if ARG1 is a power of two, but it's hard to
> >      tell if it is or not in a portable manner.  */
> > -      if (TREE_CODE (arg1) == REAL_CST)
> > +      if (TREE_CODE (arg1) == REAL_CST
> > +     || (TREE_CODE (arg1) == COMPLEX_CST
> > +         && COMPLEX_FLOAT_TYPE_P (TREE_TYPE (arg1)))
> > +     || (TREE_CODE (arg1) == VECTOR_CST
> > +         && VECTOR_FLOAT_TYPE_P (TREE_TYPE (arg1))))
> >     {
> >       if (flag_reciprocal_math
> > -         && 0 != (tem = const_binop (code, build_real (type, dconst1),
> > +         && 0 != (tem = fold_binary (code, type, build_one_cst (type),
> >                                       arg1)))
> 
> Any reason for not using const_binop?


As it turns out, no.  I (blindly) made this change based on your comment
in the PR...

"The fold code should probably simply use fold_binary to do the constant
folding (which already should handle 1/x for x vector and complex. There
is a build_one_cst to build the constant 1 for any type)."

...but now that I've looked at it that was unnecessary, so I must have
misinterpreted this.  I'll revert to using const_binop.

> 
> >         return fold_build2_loc (loc, MULT_EXPR, type, arg0, tem);
> > -     /* Find the reciprocal if optimizing and the result is exact.  */
> > -     if (optimize)
> > +     /* Find the reciprocal if optimizing and the result is exact.
> > +        TODO: Complex reciprocal not implemented.  */
> > +     if (optimize
> > +         && TREE_CODE (arg1) != COMPLEX_CST)
> 
> I know this is all pre-existing, but really the flag_reciprocal_math
> case should be under if (optimize), too.  So, can you move this check
> to the toplevel covering both cases?

Sure.

> 
> The testcases should apply to generic vectors, too, and should scan
> the .original dump (where folding first applied).  So they should
> not be target specific (and they should use -freciprocal-math).

OK.  I was ignorant of the generic vector syntax using __attribute__.
If I change pr44214-1.c to use:

--
typedef double v2sd __attribute__ ((vector_size (16)));

void do_div (v2sd *a, v2sd *b)
{
  v2sd c = { 2.0, 3.0 };
  *a = *b / c;
}
--

it compiles fine, but the .original dump doesn't show the fold:

--
;; Function do_div (null)
;; enabled by -tree-original


{
  v2sd c = { 2.0e+0, 3.0e+0 };

    v2sd c = { 2.0e+0, 3.0e+0 };
  *a = *b / c;
}
-- 

However, the .optimized dump does:

--
;; Function do_div (do_div, funcdef_no=0, decl_uid=1998, cgraph_uid=0)

do_div (v2sd * a, v2sd * b)
{
  double D.2019;
  double D.2018;
  double D.2017;
  double D.2016;
  vector(2) double D.2009;
  vector(2) double D.2008;

<bb 2>:
  D.2008_2 = *b_1(D);
  D.2016_7 = BIT_FIELD_REF <D.2008_2, 64, 0>;
  D.2017_8 = D.2016_7 * 5.0e-1;
  D.2018_9 = BIT_FIELD_REF <D.2008_2, 64, 64>;
  D.2019_10 = D.2018_9 *
3.33333333333333314829616256247390992939472198486328125e-1;
  D.2009_3 = {D.2017_8, D.2019_10};
  *a_4(D) = D.2009_3;
  return;

}
--

For the complex case, the .original dump works fine.  Is there something
different I should be doing for the vector case?

Thanks,
Bill
> 
> Ok with those changes.
> 
> Thanks,
> Richard.
> 
> >         {
> > -         REAL_VALUE_TYPE r;
> > -         r = TREE_REAL_CST (arg1);
> > -         if (exact_real_inverse (TYPE_MODE(TREE_TYPE(arg0)), &r))
> > -           {
> > -             tem = build_real (type, r);
> > -             return fold_build2_loc (loc, MULT_EXPR, type,
> > -                                 fold_convert_loc (loc, type, arg0), tem);
> > -           }
> > +         tree inverse = exact_inverse (TREE_TYPE (arg0), arg1);
> > +
> > +         if (inverse)
> > +           return fold_build2_loc (loc, MULT_EXPR, type, arg0, inverse);
> >         }
> >     }
> >        /* Convert A/B/C to A/(B*C).  */
> > Index: gcc/testsuite/gcc.target/powerpc/pr44214-3.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/powerpc/pr44214-3.c    (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/pr44214-3.c    (revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -mcpu=power7 -fdump-tree-optimized" } */
> > +
> > +void do_div (vector double *a, vector double *b)
> > +{
> > +  *a = *b / (vector double) { 2.0, 2.0 };
> > +}
> > +
> > +/* Since 2.0 has an exact reciprocal, constant folding should multiply *b
> > +   by the reciprocals of the vector elements.  As a result there should be
> > +   one vector multiply and zero divides in the optimized code.  The string
> > +   " * " occurs 3 times: one multiply and two indirect parameters.  */
> > +
> > +/* { dg-final { scan-tree-dump-times " \\\* " 3 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times " / " 0 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > Index: gcc/testsuite/gcc.target/powerpc/pr44214-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.target/powerpc/pr44214-1.c    (revision 0)
> > +++ gcc/testsuite/gcc.target/powerpc/pr44214-1.c    (revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ffast-math -mcpu=power7 -fdump-tree-optimized" } */
> > +
> > +void do_div (vector double *a, vector double *b)
> > +{
> > +  *a = *b / (vector double) { 2.0, 3.0 };
> > +}
> > +
> > +/* Constant folding should multiply *b by the reciprocals of the
> > +   vector elements.  As a result there should be one vector multiply
> > +   and zero divides in the optimized code.  The string " * " occurs
> > +   3 times: one multiply and two indirect parameters.  */
> > +
> > +/* { dg-final { scan-tree-dump-times " \\\* " 3 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times " / " 0 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
> > Index: gcc/testsuite/gcc.dg/pr44214-2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/pr44214-2.c        (revision 0)
> > +++ gcc/testsuite/gcc.dg/pr44214-2.c        (revision 0)
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -ffast-math -fdump-tree-optimized" } */
> > +
> > +void do_div (_Complex double *a, _Complex double *b)
> > +{
> > +  *a = *b / (4.0 - 5.0fi);
> > +}
> > +
> > +/* Constant folding should multiply *b by the reciprocal of 4-5i
> > +   = 4/41 - (5/41)i.  As a result there should be 4 multiplies and
> > +   zero divides in the optimized code.  The string " * " occurs 6
> > +   times: 4 multiplies and 2 indirect parameters.  */
> > +
> > +/* { dg-final { scan-tree-dump-times " \\\* " 6 "optimized" } } */
> > +/* { dg-final { scan-tree-dump-times " / " 0 "optimized" } } */
> > +/* { dg-final { cleanup-tree-dump "optimized" } } */
>

Re: [PATCH] Fix PR44214

Reply via email to