On Fri, Oct 18, 2013 at 12:06 PM, Zhenqiang Chen <zhenqiang.c...@linaro.org> wrote: > On 18 October 2013 17:53, Richard Biener <richard.guent...@gmail.com> wrote: >> On Fri, Oct 18, 2013 at 11:21 AM, Zhenqiang Chen >> <zhenqiang.c...@linaro.org> wrote: >>> On 18 October 2013 00:58, Jeff Law <l...@redhat.com> wrote: >>>> On 10/17/13 05:03, Richard Biener wrote: >>>>>>> >>>>>>> Is it OK for trunk? >>>>>> >>>>>> >>>>>> I had a much simpler change which did basically the same from 4.7 (I >>>>>> can update it if people think this is a better approach). >>>>> >>>>> >>>>> I like that more (note you can now use is_gimple_condexpr as predicate >>>>> for force_gimple_operand). >>>> >>>> The obvious question is whether or not Andrew's simpler change picks up as >>>> many transformations as Zhenqiang's change. If not are the things missed >>>> important. >>>> >>>> Zhenqiang, can you do some testing of your change vs Andrew P.'s change? >>> >>> Here is a rough compare: >>> >>> 1) Andrew P.'s change can not handle ssa-ifcombine-ccmp-3.c (included >>> in my patch). Root cause is that it does not skip "LABEL". The guard >>> to do this opt should be the same the bb_has_overhead_p in my patch. >> >> I think we want a "proper" predicate in tree-cfg.c for this, like maybe >> a subset of tree_forwarder_block_p or whatever it will end up looking >> like (we need "effectively empty BB" elsewhere, for example in vectorization, >> add a flag to allow a condition ending the BB and the predicate is done). >> >>> 2) Andrew P.'s change always generate TRUTH_AND_EXPR, which is not >>> efficient for "||". e.g. For ssa-ifcombine-ccmp-6.c, it will generate >>> >>> _3 = a_2(D) > 0; >>> _5 = b_4(D) > 0; >>> _6 = _3 | _5; >>> _9 = c_7(D) <= 0; >>> _10 = ~_6; >>> _11 = _9 & _10; >>> if (_11 == 0) >>> >>> With my patch, it will generate >>> >>> _3 = a_2(D) > 0; >>> _5 = b_4(D) > 0; >>> _6 = _3 | _5; >>> _9 = c_7(D) > 0; >>> _10 = _6 | _9; >>> if (_10 != 0) >> >> But that seems like a missed simplification in predicate combining >> which should be fixed more generally. >> >>> 3) The good thing of Andrew P.'s change is that "Inverse the order of >>> the basic block walk" so it can do combine recursively. >>> >>> But I think we need some heuristic to control the number of ifs. Move >>> too much compares from >>> the inner_bb to outer_bb is not good. >> >> True, but that's what fold-const.c does, no? > > Based on current fold-const, we can not generate more than "two" > compares in a basic block.
I think you are wrong as fold is invoked recursively on a && b && c && d. > But if ifs can be combined recursively in ifcombine, we might generate > many compares in a basic block. Yes, we can end up with that for the other simplifications done in ifcombine as well. Eventually there needs to be a cost model for this (use edge probabilities for example, so that if you have profile data you never combine cold blocks). Any takers? Richard. >>> 4) Another good thing of Andrew P.'s change is that it reuses some >>> existing functions. So it looks much simple. >> >> Indeed - that's what I like about it. >> >>>>> >>>>> With that we should be able to kill the fold-const.c transform? >>>> >>>> That would certainly be nice and an excellent follow-up for Zhenqiang. >>> >>> That's my final goal to "kill the fold-const.c transform". I think we >>> may combine the two changes to make a "simple" and "good" patch. >> >> Thanks, >> Richard. >> >>> Thanks! >>> -Zhenqiang