On Sun, Oct 27, 2013 at 7:55 PM, Andrew Pinski <pins...@gmail.com> wrote: > On Sat, Oct 26, 2013 at 4:49 PM, Andrew Pinski <pins...@gmail.com> wrote: >> On Sat, Oct 26, 2013 at 2:30 PM, Andrew Pinski <pins...@gmail.com> wrote: >>> On Fri, Oct 18, 2013 at 2:21 AM, Zhenqiang Chen >>> <zhenqiang.c...@linaro.org> wrote: >>>> On 18 October 2013 00:58, Jeff Law <l...@redhat.com> wrote: >>>>> On 10/17/13 05:03, Richard Biener wrote: >>>>>>>> >>>>>>>> Is it OK for trunk? >>>>>>> >>>>>>> >>>>>>> I had a much simpler change which did basically the same from 4.7 (I >>>>>>> can update it if people think this is a better approach). >>>>>> >>>>>> >>>>>> I like that more (note you can now use is_gimple_condexpr as predicate >>>>>> for force_gimple_operand). >>>>> >>>>> The obvious question is whether or not Andrew's simpler change picks up as >>>>> many transformations as Zhenqiang's change. If not are the things missed >>>>> important. >>>>> >>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change? >>>> >>>> Here is a rough compare: >>>> >>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included >>>> in my patch). Root cause is that it does not skip "LABEL". The guard >>>> to do this opt should be the same the bb_has_overhead_p in my patch. >>> >>> This should be an easy change, I am working on this right now. >>> >>>> >>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not >>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate >>>> >>>> _3 = a_2(D) > 0; >>>> _5 = b_4(D) > 0; >>>> _6 = _3 | _5; >>>> _9 = c_7(D) <= 0; >>>> _10 = ~_6; >>>> _11 = _9 & _10; >>>> if (_11 == 0) >>>> >>>> With my patch, it will generate >>>> >>>> _3 = a_2(D) > 0; >>>> _5 = b_4(D) > 0; >>>> _6 = _3 | _5; >>>> _9 = c_7(D) > 0; >>>> _10 = _6 | _9; >>>> if (_10 != 0) >>> >>> As mentioned otherwise, this seems like a missed optimization inside >>> forwprop. When I originally wrote this code there used to be two >>> cases one for & and one for |, but this was removed sometime and I >>> just made the code evolve with that. >> >> Actually I can make a small patch (3 lines) to my current patch which >> causes tree-ssa-ifcombine.c to produce the behavior of your patch. >> >> if (result_inv) >> { >> t = fold_build1 (TRUTH_NOT_EXPR, TREE_TYPE (t), t); >> result_inv = false; >> } >> >> I will submit a new patch after some testing of my current patch which >> fixes items 1 and 2. > > > Here is my latest patch which adds the testcases from Zhenqiang's > patch and fixes item 1 and 2. > > OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
return i; } +/* Return a new iterator pointing to the first non-debug non-label statement in + basic block BB. */ + +static inline gimple_stmt_iterator +gsi_start_nondebug_after_labels_bb (basic_block bb) vertical space before the comment missing. @@ -631,7 +669,7 @@ tree_ssa_ifcombine (void) bbs = single_pred_before_succ_order (); calculate_dominance_info (CDI_DOMINATORS); - for (i = 0; i < n_basic_blocks - NUM_FIXED_BLOCKS; ++i) + for (i = n_basic_blocks - NUM_FIXED_BLOCKS -1; i >= 0; i--) { basic_block bb = bbs[i]; gimple stmt = last_stmt (bb); The original code matches what phiopt does which even has a comment: /* Search every basic block for COND_EXPR we may be able to optimize. We walk the blocks in order that guarantees that a block with a single predecessor is processed before the predecessor. This ensures that we collapse inner ifs before visiting the outer ones, and also that we do not try to visit a removed block. */ bb_order = single_pred_before_succ_order (); n = n_basic_blocks - NUM_FIXED_BLOCKS; for (i = 0; i < n; i++) so if the reverse order is better here (and in phiopt?) then it at least needs a comment explaining why. Otherwise the patch looks good, but please iterate with Jeff over the dom testcase issue. Thanks, Richard. > Thanks, > Andrew Pinski > > ChangeLog: > * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. > (ifcombine_ifandif): Handle cases where > maybe_fold_and_comparisons fails, combining the branches > anyways. > (tree_ssa_ifcombine): Inverse the order of > the basic block walk, increases the number of combinings. > * gimple.h (gsi_start_nondebug_after_labels_bb): New function. > > > testsuite/ChangeLog: > > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. > * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. > > * gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent > conditional move to be used. > * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for "one or more > intermediate". > > > > >> >> Thanks, >> Andrew Pinski >> >> >>> >>>> >>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of >>>> the basic block walk" so it can do combine recursively. >>>> >>>> But I think we need some heuristic to control the number of ifs. Move >>>> too much compares from >>>> the inner_bb to outer_bb is not good. >>> >>> >>> I think this depends on the target. For MIPS we don't want an upper >>> bound as integer comparisons go directly to GPRs. I wrote this patch >>> with MIPS in mind as that was the target I was working on. >>> >>>> >>>> 4) Another good thing of Andrew P.'s change is that it reuses some >>>> existing functions. So it looks much simple. >>> >>> >>> Thanks, >>> Andrew Pinski >>> >>>> >>>>>> >>>>>> With that we should be able to kill the fold-const.c transform? >>>>> >>>>> That would certainly be nice and an excellent follow-up for Zhenqiang. >>>> >>>> That's my final goal to "kill the fold-const.c transform". I think we >>>> may combine the two changes to make a "simple" and "good" patch. >>>> >>>> Thanks! >>>> -Zhenqiang