On Sun, Apr 13, 2014 at 10:58 PM, Marc Glisse <marc.gli...@inria.fr> wrote:
> On Mon, 3 Mar 2014, Richard Biener wrote:
>
>> On Sat, Mar 1, 2014 at 3:33 PM, Marc Glisse <marc.gli...@inria.fr> wrote:
>>>
>>> Hello,
>>>
>>> again, a stage 1 patch that I will ping then, but early comments are
>>> welcome.
>>>
>>> PR 59100 was asking to transform n?rotate(x,n):x to rotate(x,n) (because
>>> it
>>> can be hard to write a strictly valid rotate in plain C). The operation
>>> is
>>> really:
>>> (x != neutral) ? x op y : y
>>> where neutral is such that (neutral op y) is always y, so that's what I
>>> implemented (and absorbing elements while I was at it).
>>>
>>> For some operations on some platforms, the transformation may not be such
>>> a
>>> good idea, in particular if division is very slow and b is 1 most of the
>>> time, then computing a/b may be slower than (b!=1)?a/b:a. The easiest
>>> might
>>> be to comment out those operations in the switch for now. I think
>>> divisions
>>> are the only ones slow enough to deserve this, though there certainly are
>>> CPUs where multiplication is not so fast, and even for rotate it may not
>>> always be a win if the processor doesn't have a rotate instruction and
>>> the
>>> shift amount is almost always 0.
>>
>>
>> You only handle integer operations, so checking for INTEGER_TYPE_P
>> early on would make sense.
>
>
> Ah, I wasn't planning on restricting this to integers but yes, indeed, since
> it ended up that way...
>
>
>> Note that some archs may dispatch to libgcc for integer operations
>> so it may make sense to check whether that is the case (you can
>> query optabs to check that) - if the comparison also dispatches to
>> libgcc then of course the transform would be a win again.
>
>
> Ok. I am not sure about using cbranch_optab, other ideas for the comparison?
>
>
>> Even on x86 TImode division uses libgcc.
>
>
> This reminds me of PR 58897 (unrelated).
>
>
>> Note also that value-profiling may have created this special case in the
>> first place! (gimple_divmod_fixed_value_transform)
>
>
> Test coverage must be limited if I didn't break the testsuite :-(
>
> value-profiling seems to set edge probabilities when it does the
> transformation, so I am checking only that.
>
>
>> Otherwise I think this is a good transform.  Did you check if it triggers
>> during GCC bootstrap?
>
>
> It does, a few times. Java has the obvious:
>
>     int padLength = blockSize;
>     if (length % blockSize != 0)
>       padLength = blockSize - length % blockSize;
>
> sched-deps.c has (I didn't check if it was actually this piece of code that
> triggered or another one in the same file):
>
>       if ((ds1 & t) && !(ds2 & t))
>         ds |= ds1 & t;
>       else if (!(ds1 & t) && (ds2 & t))
>         ds |= ds2 & t;
>
> and we end up seeing (phiopt3) if(a!=0)ds|=a.
>
> And mostly bitmap.h where we see quite a bit of:
>
>   if (_867 != 0)
>     goto <bb 55>;
>   else
>     goto <bb 56>;
>
>   <bb 55>:
>   start_bit_869 = _867 * 128;
>
>   <bb 56>:
>   # start_bit_870 = PHI <0(54), start_bit_869(55)>
>
>
> Bootstrap+testsuite on x86_64-linux-gnu (modulo a minor reformatting, I'm
> retesting anyway).

Few comments below

> 2014-04-14  Marc Glisse  <marc.gli...@inria.fr>
>
>         PR tree-optimization/59100
> gcc/
>         * tree-ssa-phiopt.c (neutral_element_p, absorbing_element_p,
>         is_cheap_stmt): New functions.
>
>         (value_replacement): Handle conditional binary operations with a
>         neutral or absorbing element.
> gcc/testsuite/
>         * gcc.dg/tree-ssa/phi-opt-12.c: New file.
>         * gcc.dg/tree-ssa/phi-opt-13.c: Likewise.
>
> --
> Marc Glisse
> Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c  (working copy)
> @@ -0,0 +1,23 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-phiopt1" } */
> +
> +int f(int a, int b, int c) {
> +  if (c > 5) return c;
> +  if (a == 0) return b;
> +  return a + b;
> +}
> +
> +unsigned rot(unsigned x, int n) {
> +  const int bits = __CHAR_BIT__ * __SIZEOF_INT__;
> +  return (n == 0) ? x : ((x << n) | (x >> (bits - n)));
> +}
> +
> +unsigned m(unsigned a, unsigned b) {
> +  if (a == 0)
> +    return 0;
> +  else
> +    return a & b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "goto" 2 "phiopt1" } } */
> +/* { dg-final { cleanup-tree-dump "phiopt1" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c
> ___________________________________________________________________
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property

Try to avoid this

> Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c  (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c  (working copy)
> @@ -0,0 +1,19 @@
> +/* { dg-do compile { target x86_64-*-* } } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +
> +int f(int a, int b) {
> +  if (__builtin_expect(a == 0, 1)) return b;
> +  return a + b;
> +}
> +
> +// optab_handler can handle if(b==1) but not a/b
> +// so we consider a/b too expensive.
> +unsigned __int128 g(unsigned __int128 a, unsigned __int128 b) {
> +  if (b == 1)
> +    return a;
> +  else
> +    return a / b;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "goto " 4 "optimized" } } */
> +/* { dg-final { cleanup-tree-dump "optimized" } } */
>
> Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c
> ___________________________________________________________________
> Added: svn:keywords
> ## -0,0 +1 ##
> +Author Date Id Revision URL
> \ No newline at end of property
> Added: svn:eol-style
> ## -0,0 +1 ##
> +native
> \ No newline at end of property
> Index: gcc/tree-ssa-phiopt.c
> ===================================================================
> --- gcc/tree-ssa-phiopt.c       (revision 209347)
> +++ gcc/tree-ssa-phiopt.c       (working copy)
> @@ -140,20 +140,37 @@ static bool gate_hoist_loads (void);
>         x = PHI (CONST, a)
>
>     Gets replaced with:
>       bb0:
>       bb2:
>         t1 = a == CONST;
>         t2 = b > c;
>         t3 = t1 & t2;
>         x = a;
>
> +
> +   It also replaces
> +
> +     bb0:
> +       if (a != 0) goto bb1; else goto bb2;
> +     bb1:
> +       c = a + b;
> +     bb2:
> +       x = PHI <c (bb1), b (bb0), ...>;
> +
> +   with
> +
> +     bb0:
> +       c = a + b;
> +     bb2:
> +       x = PHI <c (bb0), ...>;
> +
>     ABS Replacement
>     ---------------
>
>     This transformation, implemented in abs_replacement, replaces
>
>       bb0:
>         if (a >= 0) goto bb2; else goto bb1;
>       bb1:
>         x = -a;
>       bb2:
> @@ -809,20 +826,104 @@ operand_equal_for_value_replacement (con
>    if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp))
>      return true;
>
>    tmp = gimple_assign_rhs2 (def);
>    if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp))
>      return true;
>
>    return false;
>  }
>
> +/* Returns true if ARG is a neutral element for operation CODE
> +   on the RIGHT side.  */
> +
> +static bool
> +neutral_element_p (tree_code code, tree arg, bool right)
> +{
> +  switch (code)
> +    {
> +    case PLUS_EXPR:
> +    case BIT_IOR_EXPR:
> +    case BIT_XOR_EXPR:
> +      return integer_zerop (arg);
> +
> +    case LROTATE_EXPR:
> +    case RROTATE_EXPR:
> +    case LSHIFT_EXPR:
> +    case RSHIFT_EXPR:
> +    case MINUS_EXPR:
> +    case POINTER_PLUS_EXPR:
> +      return right && integer_zerop (arg);
> +
> +    case MULT_EXPR:
> +      return integer_onep (arg);
> +
> +    // Are those too expensive? Should we check edge probabilities?

Comment obsolete now

> +    case TRUNC_DIV_EXPR:
> +    case CEIL_DIV_EXPR:
> +    case FLOOR_DIV_EXPR:
> +    case ROUND_DIV_EXPR:
> +    case EXACT_DIV_EXPR:
> +      return right && integer_onep (arg);
> +
> +    case BIT_AND_EXPR:
> +      return integer_all_onesp (arg);
> +
> +    default:
> +      return false;
> +    }
> +}
> +
> +/* Returns true if ARG is an absorbing element for operation CODE.  */
> +
> +static bool
> +absorbing_element_p (tree_code code, tree arg)
> +{
> +  switch (code)
> +    {
> +    case BIT_IOR_EXPR:
> +      return integer_all_onesp (arg);
> +
> +    case MULT_EXPR:
> +    case BIT_AND_EXPR:
> +      return integer_zerop (arg);
> +
> +    default:
> +      return false;
> +    }
> +}
> +
> +/* Returns true if the statement is cheap. The simple heuristic used here
> +   is that anything the optab knows is cheap.  */
> +
> +static bool
> +is_cheap_stmt (gimple stmt)
> +{
> +  tree type;
> +  optab op;
> +  if (is_gimple_assign (stmt))
> +    {
> +      type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> +      enum tree_code code = gimple_assign_rhs_code (stmt);
> +      op = optab_for_tree_code (code, type, optab_scalar);
> +    }
> +  else if (gimple_code (stmt) == GIMPLE_COND)
> +    {
> +      type = TREE_TYPE (gimple_cond_lhs (stmt));
> +      op = cbranch_optab;

Hum.  cmp_optab/ucmp_optab?  But it seems those are only used
to query libfunc names ...  Maybe try using can_compare_p instead
with ccp_jump, that looks like a suitable abstraction for is_cheap_stmt.

> +    }
> +  else
> +    gcc_unreachable ();
> +  return (op != unknown_optab
> +         && optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing);
> +}
> +
>  /*  The function value_replacement does the main work of doing the value
>      replacement.  Return non-zero if the replacement is done.  Otherwise
> return
>      0.  If we remove the middle basic block, return 2.
>      BB is the basic block where the replacement is going to be done on.
> ARG0
>      is argument 0 from the PHI.  Likewise for ARG1.  */
>
>  static int
>  value_replacement (basic_block cond_bb, basic_block middle_bb,
>                    edge e0, edge e1, gimple phi,
>                    tree arg0, tree arg1)
> @@ -833,23 +934,21 @@ value_replacement (basic_block cond_bb,
>    enum tree_code code;
>    bool emtpy_or_with_defined_p = true;
>
>    /* If the type says honor signed zeros we cannot do this
>       optimization.  */
>    if (HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (arg1))))
>      return 0;
>
>    /* If there is a statement in MIDDLE_BB that defines one of the PHI
>       arguments, then adjust arg0 or arg1.  */
> -  gsi = gsi_after_labels (middle_bb);
> -  if (!gsi_end_p (gsi) && is_gimple_debug (gsi_stmt (gsi)))
> -    gsi_next_nondebug (&gsi);
> +  gsi = gsi_start_nondebug_after_labels_bb (middle_bb);
>    while (!gsi_end_p (gsi))
>      {
>        gimple stmt = gsi_stmt (gsi);
>        tree lhs;
>        gsi_next_nondebug (&gsi);
>        if (!is_gimple_assign (stmt))
>         {
>           emtpy_or_with_defined_p = false;
>           continue;
>         }
> @@ -931,20 +1030,66 @@ value_replacement (basic_block cond_bb,
>               print_generic_expr (dump_file, gimple_phi_result (phi), 0);
>               fprintf (dump_file, " reduced for COND_EXPR in block %d to ",
>                        cond_bb->index);
>               print_generic_expr (dump_file, arg, 0);
>               fprintf (dump_file, ".\n");
>              }
>            return 1;
>         }
>
>      }
> +
> +  /* Now optimize (x != 0) ? x + y : y to just y.
> +     The following condition is too restrictive, there can easily be
> another
> +     stmt in middle_bb, for instance a CONVERT_EXPR for the second
> argument.  */
> +  gimple assign = last_and_only_stmt (middle_bb);
> +  if (!assign || gimple_code (assign) != GIMPLE_ASSIGN
> +      || gimple_assign_rhs_class (assign) != GIMPLE_BINARY_RHS
> +      || !INTEGRAL_TYPE_P (TREE_TYPE (arg0)))

That will make POINTER_PLUS_EXPR unhandled (pointer arg0).
BIT_AND_EXPR also is valid for pointers (both arguments are pointer type).

> +    return 0;
> +
> +  /* assign may call a libgcc routine, which is slow.  */
> +  if (!is_cheap_stmt (assign) && is_cheap_stmt (cond))
> +    return 0;
> +
> +  /* If the special case has a high probability, keep it.  */
> +  if (EDGE_PRED (middle_bb, 0)->probability < PROB_EVEN)

I suppose Honza has a comment on how to test this properly
(not sure if ->probability or ->frequency is always initialized properly).
for example single_likely_edge tests profile_status_for_fn !=
PROFILE_ABSENT (and uses a fixed probability value ...).
Anyway, the comparison looks backwards to me, but maybe I'm
missing sth - I'd use >= PROB_LIKELY ;)

Otherwise the patch looks ok to me.

Thanks,
Richard.

> +    return 0;
> +
> +  tree lhs = gimple_assign_lhs (assign);
> +  tree rhs1 = gimple_assign_rhs1 (assign);
> +  tree rhs2 = gimple_assign_rhs2 (assign);
> +  enum tree_code code_def = gimple_assign_rhs_code (assign);
> +  tree cond_lhs = gimple_cond_lhs (cond);
> +  tree cond_rhs = gimple_cond_rhs (cond);
> +
> +  if (((code == NE_EXPR && e1 == false_edge)
> +       || (code == EQ_EXPR && e1 == true_edge))
> +      && arg0 == lhs
> +      && ((arg1 == rhs1
> +          && operand_equal_for_phi_arg_p (rhs2, cond_lhs)
> +          && neutral_element_p (code_def, cond_rhs, true))
> +         || (arg1 == rhs2
> +             && operand_equal_for_phi_arg_p (rhs1, cond_lhs)
> +             && neutral_element_p (code_def, cond_rhs, false))
> +         || (operand_equal_for_phi_arg_p (arg1, cond_rhs)
> +             && (operand_equal_for_phi_arg_p (rhs2, cond_lhs)
> +                 || operand_equal_for_phi_arg_p (rhs1, cond_lhs))
> +             && absorbing_element_p (code_def, cond_rhs))))
> +    {
> +      gsi = gsi_for_stmt (cond);
> +      gimple_stmt_iterator gsi_from = gsi_for_stmt (assign);
> +      gsi_move_before (&gsi_from, &gsi);
> +      replace_phi_edge_with_variable (cond_bb, e1, phi, lhs);
> +      return 2;
> +    }
> +
>    return 0;
>  }
>
>  /*  The function minmax_replacement does the main work of doing the minmax
>      replacement.  Return true if the replacement is done.  Otherwise return
>      false.
>      BB is the basic block where the replacement is going to be done on.
> ARG0
>      is argument 0 from the PHI.  Likewise for ARG1.  */
>
>  static bool
>

Reply via email to