On Sun, Apr 13, 2014 at 10:58 PM, Marc Glisse <marc.gli...@inria.fr> wrote: > On Mon, 3 Mar 2014, Richard Biener wrote: > >> On Sat, Mar 1, 2014 at 3:33 PM, Marc Glisse <marc.gli...@inria.fr> wrote: >>> >>> Hello, >>> >>> again, a stage 1 patch that I will ping then, but early comments are >>> welcome. >>> >>> PR 59100 was asking to transform n?rotate(x,n):x to rotate(x,n) (because >>> it >>> can be hard to write a strictly valid rotate in plain C). The operation >>> is >>> really: >>> (x != neutral) ? x op y : y >>> where neutral is such that (neutral op y) is always y, so that's what I >>> implemented (and absorbing elements while I was at it). >>> >>> For some operations on some platforms, the transformation may not be such >>> a >>> good idea, in particular if division is very slow and b is 1 most of the >>> time, then computing a/b may be slower than (b!=1)?a/b:a. The easiest >>> might >>> be to comment out those operations in the switch for now. I think >>> divisions >>> are the only ones slow enough to deserve this, though there certainly are >>> CPUs where multiplication is not so fast, and even for rotate it may not >>> always be a win if the processor doesn't have a rotate instruction and >>> the >>> shift amount is almost always 0. >> >> >> You only handle integer operations, so checking for INTEGER_TYPE_P >> early on would make sense. > > > Ah, I wasn't planning on restricting this to integers but yes, indeed, since > it ended up that way... > > >> Note that some archs may dispatch to libgcc for integer operations >> so it may make sense to check whether that is the case (you can >> query optabs to check that) - if the comparison also dispatches to >> libgcc then of course the transform would be a win again. > > > Ok. I am not sure about using cbranch_optab, other ideas for the comparison? > > >> Even on x86 TImode division uses libgcc. > > > This reminds me of PR 58897 (unrelated). > > >> Note also that value-profiling may have created this special case in the >> first place! (gimple_divmod_fixed_value_transform) > > > Test coverage must be limited if I didn't break the testsuite :-( > > value-profiling seems to set edge probabilities when it does the > transformation, so I am checking only that. > > >> Otherwise I think this is a good transform. Did you check if it triggers >> during GCC bootstrap? > > > It does, a few times. Java has the obvious: > > int padLength = blockSize; > if (length % blockSize != 0) > padLength = blockSize - length % blockSize; > > sched-deps.c has (I didn't check if it was actually this piece of code that > triggered or another one in the same file): > > if ((ds1 & t) && !(ds2 & t)) > ds |= ds1 & t; > else if (!(ds1 & t) && (ds2 & t)) > ds |= ds2 & t; > > and we end up seeing (phiopt3) if(a!=0)ds|=a. > > And mostly bitmap.h where we see quite a bit of: > > if (_867 != 0) > goto <bb 55>; > else > goto <bb 56>; > > <bb 55>: > start_bit_869 = _867 * 128; > > <bb 56>: > # start_bit_870 = PHI <0(54), start_bit_869(55)> > > > Bootstrap+testsuite on x86_64-linux-gnu (modulo a minor reformatting, I'm > retesting anyway).
Few comments below > 2014-04-14 Marc Glisse <marc.gli...@inria.fr> > > PR tree-optimization/59100 > gcc/ > * tree-ssa-phiopt.c (neutral_element_p, absorbing_element_p, > is_cheap_stmt): New functions. > > (value_replacement): Handle conditional binary operations with a > neutral or absorbing element. > gcc/testsuite/ > * gcc.dg/tree-ssa/phi-opt-12.c: New file. > * gcc.dg/tree-ssa/phi-opt-13.c: Likewise. > > -- > Marc Glisse > Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c (working copy) > @@ -0,0 +1,23 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O -fdump-tree-phiopt1" } */ > + > +int f(int a, int b, int c) { > + if (c > 5) return c; > + if (a == 0) return b; > + return a + b; > +} > + > +unsigned rot(unsigned x, int n) { > + const int bits = __CHAR_BIT__ * __SIZEOF_INT__; > + return (n == 0) ? x : ((x << n) | (x >> (bits - n))); > +} > + > +unsigned m(unsigned a, unsigned b) { > + if (a == 0) > + return 0; > + else > + return a & b; > +} > + > +/* { dg-final { scan-tree-dump-times "goto" 2 "phiopt1" } } */ > +/* { dg-final { cleanup-tree-dump "phiopt1" } } */ > > Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-12.c > ___________________________________________________________________ > Added: svn:keywords > ## -0,0 +1 ## > +Author Date Id Revision URL > \ No newline at end of property > Added: svn:eol-style > ## -0,0 +1 ## > +native > \ No newline at end of property Try to avoid this > Index: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c > =================================================================== > --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c (revision 0) > +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c (working copy) > @@ -0,0 +1,19 @@ > +/* { dg-do compile { target x86_64-*-* } } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > + > +int f(int a, int b) { > + if (__builtin_expect(a == 0, 1)) return b; > + return a + b; > +} > + > +// optab_handler can handle if(b==1) but not a/b > +// so we consider a/b too expensive. > +unsigned __int128 g(unsigned __int128 a, unsigned __int128 b) { > + if (b == 1) > + return a; > + else > + return a / b; > +} > + > +/* { dg-final { scan-tree-dump-times "goto " 4 "optimized" } } */ > +/* { dg-final { cleanup-tree-dump "optimized" } } */ > > Property changes on: gcc/testsuite/gcc.dg/tree-ssa/phi-opt-13.c > ___________________________________________________________________ > Added: svn:keywords > ## -0,0 +1 ## > +Author Date Id Revision URL > \ No newline at end of property > Added: svn:eol-style > ## -0,0 +1 ## > +native > \ No newline at end of property > Index: gcc/tree-ssa-phiopt.c > =================================================================== > --- gcc/tree-ssa-phiopt.c (revision 209347) > +++ gcc/tree-ssa-phiopt.c (working copy) > @@ -140,20 +140,37 @@ static bool gate_hoist_loads (void); > x = PHI (CONST, a) > > Gets replaced with: > bb0: > bb2: > t1 = a == CONST; > t2 = b > c; > t3 = t1 & t2; > x = a; > > + > + It also replaces > + > + bb0: > + if (a != 0) goto bb1; else goto bb2; > + bb1: > + c = a + b; > + bb2: > + x = PHI <c (bb1), b (bb0), ...>; > + > + with > + > + bb0: > + c = a + b; > + bb2: > + x = PHI <c (bb0), ...>; > + > ABS Replacement > --------------- > > This transformation, implemented in abs_replacement, replaces > > bb0: > if (a >= 0) goto bb2; else goto bb1; > bb1: > x = -a; > bb2: > @@ -809,20 +826,104 @@ operand_equal_for_value_replacement (con > if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp)) > return true; > > tmp = gimple_assign_rhs2 (def); > if (rhs_is_fed_for_value_replacement (arg0, arg1, code, tmp)) > return true; > > return false; > } > > +/* Returns true if ARG is a neutral element for operation CODE > + on the RIGHT side. */ > + > +static bool > +neutral_element_p (tree_code code, tree arg, bool right) > +{ > + switch (code) > + { > + case PLUS_EXPR: > + case BIT_IOR_EXPR: > + case BIT_XOR_EXPR: > + return integer_zerop (arg); > + > + case LROTATE_EXPR: > + case RROTATE_EXPR: > + case LSHIFT_EXPR: > + case RSHIFT_EXPR: > + case MINUS_EXPR: > + case POINTER_PLUS_EXPR: > + return right && integer_zerop (arg); > + > + case MULT_EXPR: > + return integer_onep (arg); > + > + // Are those too expensive? Should we check edge probabilities? Comment obsolete now > + case TRUNC_DIV_EXPR: > + case CEIL_DIV_EXPR: > + case FLOOR_DIV_EXPR: > + case ROUND_DIV_EXPR: > + case EXACT_DIV_EXPR: > + return right && integer_onep (arg); > + > + case BIT_AND_EXPR: > + return integer_all_onesp (arg); > + > + default: > + return false; > + } > +} > + > +/* Returns true if ARG is an absorbing element for operation CODE. */ > + > +static bool > +absorbing_element_p (tree_code code, tree arg) > +{ > + switch (code) > + { > + case BIT_IOR_EXPR: > + return integer_all_onesp (arg); > + > + case MULT_EXPR: > + case BIT_AND_EXPR: > + return integer_zerop (arg); > + > + default: > + return false; > + } > +} > + > +/* Returns true if the statement is cheap. The simple heuristic used here > + is that anything the optab knows is cheap. */ > + > +static bool > +is_cheap_stmt (gimple stmt) > +{ > + tree type; > + optab op; > + if (is_gimple_assign (stmt)) > + { > + type = TREE_TYPE (gimple_assign_rhs1 (stmt)); > + enum tree_code code = gimple_assign_rhs_code (stmt); > + op = optab_for_tree_code (code, type, optab_scalar); > + } > + else if (gimple_code (stmt) == GIMPLE_COND) > + { > + type = TREE_TYPE (gimple_cond_lhs (stmt)); > + op = cbranch_optab; Hum. cmp_optab/ucmp_optab? But it seems those are only used to query libfunc names ... Maybe try using can_compare_p instead with ccp_jump, that looks like a suitable abstraction for is_cheap_stmt. > + } > + else > + gcc_unreachable (); > + return (op != unknown_optab > + && optab_handler (op, TYPE_MODE (type)) != CODE_FOR_nothing); > +} > + > /* The function value_replacement does the main work of doing the value > replacement. Return non-zero if the replacement is done. Otherwise > return > 0. If we remove the middle basic block, return 2. > BB is the basic block where the replacement is going to be done on. > ARG0 > is argument 0 from the PHI. Likewise for ARG1. */ > > static int > value_replacement (basic_block cond_bb, basic_block middle_bb, > edge e0, edge e1, gimple phi, > tree arg0, tree arg1) > @@ -833,23 +934,21 @@ value_replacement (basic_block cond_bb, > enum tree_code code; > bool emtpy_or_with_defined_p = true; > > /* If the type says honor signed zeros we cannot do this > optimization. */ > if (HONOR_SIGNED_ZEROS (TYPE_MODE (TREE_TYPE (arg1)))) > return 0; > > /* If there is a statement in MIDDLE_BB that defines one of the PHI > arguments, then adjust arg0 or arg1. */ > - gsi = gsi_after_labels (middle_bb); > - if (!gsi_end_p (gsi) && is_gimple_debug (gsi_stmt (gsi))) > - gsi_next_nondebug (&gsi); > + gsi = gsi_start_nondebug_after_labels_bb (middle_bb); > while (!gsi_end_p (gsi)) > { > gimple stmt = gsi_stmt (gsi); > tree lhs; > gsi_next_nondebug (&gsi); > if (!is_gimple_assign (stmt)) > { > emtpy_or_with_defined_p = false; > continue; > } > @@ -931,20 +1030,66 @@ value_replacement (basic_block cond_bb, > print_generic_expr (dump_file, gimple_phi_result (phi), 0); > fprintf (dump_file, " reduced for COND_EXPR in block %d to ", > cond_bb->index); > print_generic_expr (dump_file, arg, 0); > fprintf (dump_file, ".\n"); > } > return 1; > } > > } > + > + /* Now optimize (x != 0) ? x + y : y to just y. > + The following condition is too restrictive, there can easily be > another > + stmt in middle_bb, for instance a CONVERT_EXPR for the second > argument. */ > + gimple assign = last_and_only_stmt (middle_bb); > + if (!assign || gimple_code (assign) != GIMPLE_ASSIGN > + || gimple_assign_rhs_class (assign) != GIMPLE_BINARY_RHS > + || !INTEGRAL_TYPE_P (TREE_TYPE (arg0))) That will make POINTER_PLUS_EXPR unhandled (pointer arg0). BIT_AND_EXPR also is valid for pointers (both arguments are pointer type). > + return 0; > + > + /* assign may call a libgcc routine, which is slow. */ > + if (!is_cheap_stmt (assign) && is_cheap_stmt (cond)) > + return 0; > + > + /* If the special case has a high probability, keep it. */ > + if (EDGE_PRED (middle_bb, 0)->probability < PROB_EVEN) I suppose Honza has a comment on how to test this properly (not sure if ->probability or ->frequency is always initialized properly). for example single_likely_edge tests profile_status_for_fn != PROFILE_ABSENT (and uses a fixed probability value ...). Anyway, the comparison looks backwards to me, but maybe I'm missing sth - I'd use >= PROB_LIKELY ;) Otherwise the patch looks ok to me. Thanks, Richard. > + return 0; > + > + tree lhs = gimple_assign_lhs (assign); > + tree rhs1 = gimple_assign_rhs1 (assign); > + tree rhs2 = gimple_assign_rhs2 (assign); > + enum tree_code code_def = gimple_assign_rhs_code (assign); > + tree cond_lhs = gimple_cond_lhs (cond); > + tree cond_rhs = gimple_cond_rhs (cond); > + > + if (((code == NE_EXPR && e1 == false_edge) > + || (code == EQ_EXPR && e1 == true_edge)) > + && arg0 == lhs > + && ((arg1 == rhs1 > + && operand_equal_for_phi_arg_p (rhs2, cond_lhs) > + && neutral_element_p (code_def, cond_rhs, true)) > + || (arg1 == rhs2 > + && operand_equal_for_phi_arg_p (rhs1, cond_lhs) > + && neutral_element_p (code_def, cond_rhs, false)) > + || (operand_equal_for_phi_arg_p (arg1, cond_rhs) > + && (operand_equal_for_phi_arg_p (rhs2, cond_lhs) > + || operand_equal_for_phi_arg_p (rhs1, cond_lhs)) > + && absorbing_element_p (code_def, cond_rhs)))) > + { > + gsi = gsi_for_stmt (cond); > + gimple_stmt_iterator gsi_from = gsi_for_stmt (assign); > + gsi_move_before (&gsi_from, &gsi); > + replace_phi_edge_with_variable (cond_bb, e1, phi, lhs); > + return 2; > + } > + > return 0; > } > > /* The function minmax_replacement does the main work of doing the minmax > replacement. Return true if the replacement is done. Otherwise return > false. > BB is the basic block where the replacement is going to be done on. > ARG0 > is argument 0 from the PHI. Likewise for ARG1. */ > > static bool >