On 18 October 2013 18:15, Richard Biener <richard.guent...@gmail.com> wrote: > On Fri, Oct 18, 2013 at 12:06 PM, Zhenqiang Chen > <zhenqiang.c...@linaro.org> wrote: >> On 18 October 2013 17:53, Richard Biener <richard.guent...@gmail.com> wrote: >>> On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen >>> <zhenqiang.c...@linaro.org> wrote: >>>> On 18 October 2013 00:58, Jeff Law <l...@redhat.com> wrote: >>>>> On 10/17/13 05:03, Richard Biener wrote: >>>>>>>> >>>>>>>> Is it OK for trunk? >>>>>>> >>>>>>> >>>>>>> I had a much simpler change which did basically the same from 4.7 (I >>>>>>> can update it if people think this is a better approach). >>>>>> >>>>>> >>>>>> I like that more (note you can now use is_gimple_condexpr as predicate >>>>>> for force_gimple_operand). >>>>> >>>>> The obvious question is whether or not Andrew's simpler change picks up as >>>>> many transformations as Zhenqiang's change. If not are the things missed >>>>> important. >>>>> >>>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change? >>>> >>>> Here is a rough compare: >>>> >>>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included >>>> in my patch). Root cause is that it does not skip "LABEL". The guard >>>> to do this opt should be the same the bb_has_overhead_p in my patch. >>> >>> I think we want a "proper" predicate in tree-cfg.c for this, like maybe >>> a subset of tree_forwarder_block_p or whatever it will end up looking >>> like (we need "effectively empty BB" elsewhere, for example in >>> vectorization, >>> add a flag to allow a condition ending the BB and the predicate is done). >>> >>>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not >>>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate >>>> >>>> _3 = a_2(D) > 0; >>>> _5 = b_4(D) > 0; >>>> _6 = _3 | _5; >>>> _9 = c_7(D) <= 0; >>>> _10 = ~_6; >>>> _11 = _9 & _10; >>>> if (_11 == 0) >>>> >>>> With my patch, it will generate >>>> >>>> _3 = a_2(D) > 0; >>>> _5 = b_4(D) > 0; >>>> _6 = _3 | _5; >>>> _9 = c_7(D) > 0; >>>> _10 = _6 | _9; >>>> if (_10 != 0) >>> >>> But that seems like a missed simplification in predicate combining >>> which should be fixed more generally. >>> >>>> 3) The good thing of Andrew P.'s change is that "Inverse the order of >>>> the basic block walk" so it can do combine recursively. >>>> >>>> But I think we need some heuristic to control the number of ifs. Move >>>> too much compares from >>>> the inner_bb to outer_bb is not good. >>> >>> True, but that's what fold-const.c does, no? >> >> Based on current fold-const, we can not generate more than "two" >> compares in a basic block. > > I think you are wrong as fold is invoked recursively on a && b && c && d.
Please try to compile it with -fdump-tree-gimple. It will be broken into if (a && b) if (c && d) >> But if ifs can be combined recursively in ifcombine, we might generate >> many compares in a basic block. > > Yes, we can end up with that for the other simplifications done in > ifcombine as well. Eventually there needs to be a cost model for this > (use edge probabilities for example, so that if you have profile data > you never combine cold blocks). > > Any takers? I will take it. Thanks! -Zhenqiang > Richard. > >>>> 4) Another good thing of Andrew P.'s change is that it reuses some >>>> existing functions. So it looks much simple. >>> >>> Indeed - that's what I like about it. >>> >>>>>> >>>>>> With that we should be able to kill the fold-const.c transform? >>>>> >>>>> That would certainly be nice and an excellent follow-up for Zhenqiang. >>>> >>>> That's my final goal to "kill the fold-const.c transform". I think we >>>> may combine the two changes to make a "simple" and "good" patch. >>> >>> Thanks, >>> Richard. >>> >>>> Thanks! >>>> -Zhenqiang