On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pins...@gmail.com> wrote: > On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pins...@gmail.com> wrote: >> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen >> <zhenqiang.c...@linaro.org> wrote: >>> On 18 October 2013 00:58, Jeff Law <l...@redhat.com> wrote: >>>> On 10/17/13 05:03, Richard Biener wrote: >>>>>>> >>>>>>> Is it OK for trunk? >>>>>> >>>>>> >>>>>> I had a much simpler change which did basically the same from 4.7 (I >>>>>> can update it if people think this is a better approach). >>>>> >>>>> >>>>> I like that more (note you can now use is_gimple_condexpr as predicate >>>>> for force_gimple_operand). >>>> >>>> The obvious question is whether or not Andrew's simpler change picks up as >>>> many transformations as Zhenqiang's change. If not are the things missed >>>> important. >>>> >>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change? >>> >>> Here is a rough compare: >>> >>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included >>> in my patch). Root cause is that it does not skip "LABEL". The guard >>> to do this opt should be the same the bb_has_overhead_p in my patch. >> >> This should be an easy change, I am working on this right now. >> >>> >>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not >>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate >>> >>> _3 = a_2(D) > 0; >>> _5 = b_4(D) > 0; >>> _6 = _3 | _5; >>> _9 = c_7(D) <= 0; >>> _10 = ~_6; >>> _11 = _9 & _10; >>> if (_11 == 0) >>> >>> With my patch, it will generate >>> >>> _3 = a_2(D) > 0; >>> _5 = b_4(D) > 0; >>> _6 = _3 | _5; >>> _9 = c_7(D) > 0; >>> _10 = _6 | _9; >>> if (_10 != 0) >> >> As mentioned otherwise, this seems like a missed optimization inside >> forwprop. When I originally wrote this code there used to be two >> cases one for & and one for |, but this was removed sometime and I >> just made the code evolve with that. > > Actually I can make a small patch (3 lines) to my current patch which > causes tree-ssa-ifcombine.c to produce the behavior of your patch. > > if (result_inv) > { > t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t); > result_inv = false; > } > > I will submit a new patch after some testing of my current patch which > fixes items 1 and 2.
Here is my latest patch which adds the testcases from Zhenqiang's patch and fixes item 1 and 2. OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions. Thanks, Andrew Pinski ChangeLog: * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (ifcombine_ifandif): Handle cases where maybe_fold_and_comparisons fails, combining the branches anyways. (tree_ssa_ifcombine): Inverse the order of the basic block walk, increases the number of combinings. * gimple.h (gsi_start_nondebug_after_labels_bb): New function. testsuite/ChangeLog: * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent conditional move to be used. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more intermediate". > > Thanks, > Andrew Pinski > > >> >>> >>> 3) The good thing of Andrew P.'s change is that "Inverse the order of >>> the basic block walk" so it can do combine recursively. >>> >>> But I think we need some heuristic to control the number of ifs. Move >>> too much compares from >>> the inner_bb to outer_bb is not good. >> >> >> I think this depends on the target. For MIPS we don't want an upper >> bound as integer comparisons go directly to GPRs. I wrote this patch >> with MIPS in mind as that was the target I was working on. >> >>> >>> 4) Another good thing of Andrew P.'s change is that it reuses some >>> existing functions. So it looks much simple. >> >> >> Thanks, >> Andrew Pinski >> >>> >>>>> >>>>> With that we should be able to kill the fold-const.c transform? >>>> >>>> That would certainly be nice and an excellent follow-up for Zhenqiang. >>> >>> That's my final goal to "kill the fold-const.c transform". I think we >>> may combine the two changes to make a "simple" and "good" patch. >>> >>> Thanks! >>> -Zhenqiang
Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c (revision 0) @@ -0,0 +1,14 @@ +/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */ + +/* { dg-options "-O2 -g -fdump-tree-optimized" } */ +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */ + +int t (int a, int b) +{ + if (a > 0) + if (b > 0) + return 0; + return 1; +} +/* { dg-final { scan-tree-dump "\&" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c (revision 204095) +++ testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c (working copy) @@ -42,7 +42,7 @@ expand_one_var (tree var, unsigned char abort (); } /* We should thread the jump, through an intermediate block. */ -/* { dg-final { scan-tree-dump-times "Threaded" 2 "dom1"} } */ -/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) incoming edge; \\(.*\\) joiner; \\(.*\\) nocopy;" 1 "dom1"} } */ +/* { dg-final { scan-tree-dump "Threaded" "dom1"} } */ +/* { dg-final { scan-tree-dump-times "Registering jump thread: \\(.*\\) incoming edge;" 1 "dom1"} } */ /* { dg-final { cleanup-tree-dump "dom1" } } */ Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c (revision 0) @@ -0,0 +1,17 @@ +/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */ + +/* { dg-options "-O2 -g -fdump-tree-optimized" } */ +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */ + +int t (int a, int b) +{ + if (a > 0) + goto L1; + if (b > 0) + goto L1; + return 0; +L1: + return 1; +} +/* { dg-final { scan-tree-dump "\|" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c (revision 0) @@ -0,0 +1,20 @@ +/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */ + +/* { dg-options "-O2 -g -fdump-tree-optimized" } */ +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */ + +int t (int a, int b) +{ + if (a > 0) + goto L1; + else + goto L2; +L1: + if (b > 0) + goto L2; + return 5; +L2: + return 6; +} +/* { dg-final { scan-tree-dump "\|" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c (revision 0) @@ -0,0 +1,18 @@ +/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */ + +/* { dg-options "-O2 -g -fdump-tree-optimized" } */ +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */ + +int t (int a, int b) +{ + if (a > 0) + goto L1; + if (b > 0) + goto L2; +L1: + return 0; +L2: + return 1; +} +/* { dg-final { scan-tree-dump "\&" "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: testsuite/gcc.dg/tree-ssa/phi-opt-9.c =================================================================== --- testsuite/gcc.dg/tree-ssa/phi-opt-9.c (revision 204095) +++ testsuite/gcc.dg/tree-ssa/phi-opt-9.c (working copy) @@ -2,13 +2,14 @@ /* { dg-options "-O -fdump-tree-optimized" } */ int g(int,int); +int h(int); int f(int t, int c) { int d = 0; int e = 0; if (t) { - d = c+1; + d = h(c); e = t; } else d = 0, e = 0; Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */ + +/* { dg-options "-O2 -g -fdump-tree-optimized" } */ +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */ + +int t (int a, int b, int c) +{ + if (a > 0 && b > 0 && c > 0) + return 0; + return 1; +} +/* { dg-final { scan-tree-dump-times "\&" 2 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c =================================================================== --- testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c (revision 0) +++ testsuite/gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c (revision 0) @@ -0,0 +1,13 @@ +/* { dg-do compile { target { ! "m68k*-*-* mmix*-*-* mep*-*-* bfin*-*-* v850*-*-* picochip*-*-* moxie*-*-* cris*-*-* m32c*-*-* fr30*-*-* mcore*-*-* powerpc*-*-* xtensa*-*-*"} } } */ + +/* { dg-options "-O2 -g -fdump-tree-optimized" } */ +/* { dg-additional-options "-mbranch-cost=2" { target avr-*-* } } */ + +int t (int a, int b, int c) +{ + if (a > 0 || b > 0 || c > 0) + return 0; + return 1; +} +/* { dg-final { scan-tree-dump-times "\\|" 2 "optimized" } } */ +/* { dg-final { cleanup-tree-dump "optimized" } } */ Index: tree-ssa-ifcombine.c =================================================================== --- tree-ssa-ifcombine.c (revision 204095) +++ tree-ssa-ifcombine.c (working copy) @@ -22,6 +22,10 @@ along with GCC; see the file COPYING3. #include "system.h" #include "coretypes.h" #include "tm.h" +/* rtl is needed only because arm back-end requires it for + BRANCH_COST. */ +#include "rtl.h" +#include "tm_p.h" #include "tree.h" #include "basic-block.h" #include "tree-pretty-print.h" @@ -32,6 +36,12 @@ along with GCC; see the file COPYING3. #include "ssa-iterators.h" #include "tree-pass.h" +#ifndef LOGICAL_OP_NON_SHORT_CIRCUIT +#define LOGICAL_OP_NON_SHORT_CIRCUIT \ + (BRANCH_COST (optimize_function_for_speed_p (cfun), \ + false) >= 2) +#endif + /* This pass combines COND_EXPRs to simplify control flow. It currently recognizes bit tests and comparisons in chains that represent logical and or logical or of two COND_EXPRs. @@ -488,7 +498,35 @@ ifcombine_ifandif (basic_block inner_con outer_cond_code, gimple_cond_lhs (outer_cond), gimple_cond_rhs (outer_cond)))) - return false; + { + tree t1, t2; + gimple_stmt_iterator gsi; + if (!LOGICAL_OP_NON_SHORT_CIRCUIT) + return false; + /* Only do this optimization if the inner bb contains only the conditional. */ + if (!gsi_one_before_end_p (gsi_start_nondebug_after_labels_bb (inner_cond_bb))) + return false; + t1 = fold_build2_loc (gimple_location (inner_cond), + inner_cond_code, + boolean_type_node, + gimple_cond_lhs (inner_cond), + gimple_cond_rhs (inner_cond)); + t2 = fold_build2_loc (gimple_location (outer_cond), + outer_cond_code, + boolean_type_node, + gimple_cond_lhs (outer_cond), + gimple_cond_rhs (outer_cond)); + t = fold_build2_loc (gimple_location (inner_cond), + TRUTH_AND_EXPR, boolean_type_node, t1, t2); + if (result_inv) + { + t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t); + result_inv = false; + } + gsi = gsi_for_stmt (inner_cond); + t = force_gimple_operand_gsi_1 (&gsi, t, is_gimple_condexpr, NULL, true, + GSI_SAME_STMT); + } if (result_inv) t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t); t = canonicalize_cond_expr_cond (t); @@ -631,7 +669,7 @@ tree_ssa_ifcombine (void) bbs = single_pred_before_succ_order (); calculate_dominance_info (CDI_DOMINATORS); - for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i) + for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--) { basic_block bb = bbs[i]; gimple stmt = last_stmt (bb); Index: gimple.h =================================================================== --- gimple.h (revision 204095) +++ gimple.h (working copy) @@ -5533,6 +5533,19 @@ gsi_start_nondebug_bb (basic_block bb) return i; } +/* Return a new iterator pointing to the first non-debug non-label statement in + basic block BB. */ + +static inline gimple_stmt_iterator +gsi_start_nondebug_after_labels_bb (basic_block bb) +{ + gimple_stmt_iterator i = gsi_after_labels (bb); + + if (!gsi_end_p (i) && is_gimple_debug (gsi_stmt (i))) + gsi_next_nondebug (&i); + + return i; +} /* Return a new iterator pointing to the last non-debug statement in basic block BB. */