> -----Original Message----- > From: Richard Earnshaw > Sent: Thursday, September 19, 2013 5:13 PM > To: Zhenqiang Chen > Cc: 'Richard Biener'; GCC Patches > Subject: Re: [PATCH 1/n] Add conditional compare support > > On 18/09/13 10:45, Zhenqiang Chen wrote: > > > >> -----Original Message----- > >> From: Richard Biener [mailto:richard.guent...@gmail.com] > >> Sent: Tuesday, August 27, 2013 8:18 PM > >> To: Richard Earnshaw > >> Cc: Zhenqiang Chen; GCC Patches > >> Subject: Re: [PATCH 1/n] Add conditional compare support > >> > >> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw <rearn...@arm.com> > >> wrote: > >>> On 27/08/13 12:10, Richard Biener wrote: > >>>> What's this for and what's the desired semantics? I don't like > >>>> having extra tree codes for this. Is this for a specific > >>>> instruction set feature? > >>> > >>> The background is to support the conditional compare instructions in > >>> ARM (more effectively) and AArch64 at all. > >>> > >>> The current method used in ARM is to expand into a series of > >>> store-flag instructions and then hope that combine can optimize them > >>> away (though that fails far too often, particularly when the first > >>> instruction in the sequence is combined into another pattern). To > >>> make it work at all the compiler has to lie about the costs of > >>> various store-flag type operations which overall risks producing > >>> worse code and means we also have to support many more complex > >>> multi-instruction patterns than is desirable. I really don't want > >>> to go down the same > > route > >> for AArch64. > >>> > >>> The idea behind all this is to capture potential conditional compare > >>> operations early enough in the mid end that we can keep track of > >>> them until RTL expand time and then to emit the correct logic on all > >>> targets depending on what is the best thing for that target. The > >>> current method of lowering into store-flag sequences doesn't really > >>> cut > > it. > >> > >> It seems to me that then the initial instruction selection process > >> (aka > > RTL > >> expansion) needs to be improved. As we are expanding with having the > >> CFG around it should be easy enough to detect AND/ORIF cases and do > >> better here. Yeah, I suppose this asks to turn existing jump > >> expansion > > optimizations > >> up-side-down to optimize with the GIMPLE CFG in mind. > >> > >> The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly bogus > - > >> fold-const.c is way too early to decide this. Similar to the ongoing > >> work > > of > >> expanding / building-up switch expressions in a GIMPLE pass, moving > >> expand complexity up the pipeline this asks for a GIMPLE phase that > >> moves this decision down closer to RTL expansion. > >> (We have tree-ssa-ifcombine.c that is a related GIMPLE transform > >> pass) > >> > > > > The patch is updated according to your comments. It is a basic > > support, which does not touch ifcombine and jump related optimizations > yet. > > > > Current method is: > > 1) In fold-const, if HAVE_conditional_compare, set > > LOGICAL_OP_NON_SHORT_CIRCUIT > > to optimize_function_for_speed_p. So we do not depend on > BRANCH_COST. > > 2) Identify CCMP during expanding. A CCMP candidate is a BIT_AND_EXPR > > or BIT_IOR_EXPR, whose operators are compares. > > 3) Add a new op in optab to expand the CCMP to optimized RTL, > > e.g. and_scc_scc/ior_scc_scc in ARM. > > > > Bootstrap on ARM Chrome book. > > No make check regression on Pandaboard. > > > > Thanks! > > -Zhenqiang > > > > ChangeLog: > > 2013-09-18 Zhenqiang Chen <zhenqiang.c...@linaro.org> > > > > * config/arm/arm.md (conditional_compare): New. > > * expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr): > > New. > > (expand_expr_real_1): Identify conditional compare. > > * fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update. > > * optabs.c (expand_ccmp_op): New. > > (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR. > > * optabs.def (ccmp_optab): New. > > * optabs.h (expand_ccmp_op): New. > > > > > > basic-conditional-compare-support2.patch > > > > > > N¬n‡r¥ªíÂ)emçhÂyhi×¢w^™©Ý > > > > Some general comments. > > 1) How do we get to a conditional branch from this code. It seems your new > pattern generates a store-flag value rather than a branch expansion. > Are you expecting combine or some other later pass to clean that up?
The patch still depend on combine pass to optimize the cmp to generate a "cmp_ior/cmp_and". It is not necessary a branch. e.g. "return a && b;" is not a branch when expanding. The attached patch is updated to distinguish the two cases. If the CCMP is used in a GIMPLE_COND statement, it generates instruction to match "cmp_and/cmp_ior". An new expand "cbranchcc4" is added to match the branch instruction. Then, it does not depend on later passes to optimize the CCMP. > 2) Assuming we do end up with branches, why would we not want to do this > optimization when optimzing for space? I mis-checked http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43920, which was only for THUMB mode. The attached patch removed it. > cmp r0, r1 > cmpne r2, r3 > beq L1 > > is shorter than > > cmp r0, r1 > beq L1 > cmp r2, r3 > beq L1 > 3) Is there any way to generalize this for more complex expressions? Eg > > if (a == b || a == c || a == d) > > should become > > cmp a, b > cmpne a, c > cmpne a, d > ... > > Obviously, when optimizing for speed there will probably be a limit on the > number of compares that are desirable, but I don't see why we should > arbitrarily cap it at 2. Based on current gcc, there are "no more than 2" cmps. Need update front-end or ifcombine to recover it. I planned to enhance the ifcombine to identify more CCMP opportunity. When we have more than 2 cmps, it is not hard to support it based on current patch. E.g. (a == b || a == c || a == d) will create two ccmp: CC_DEQ = CCMP (ior (EQ (a, b), EQ (a, c))) CC_DEQ = CCMP (ior (EQ (CC_DEQ, 0), EQ (a, d))) The first CCMP will generate cmp a, b cmpne a, c The second CCMP will only generate cmpne a, d CC_DEQ is used for determine the condition (ne). And from the MODE of CC_DEQ, we can distinguish the two CCMP. BTW: How to determine the max number? It seams not necessary better with more conditional compare. Thanks! -Zhenqiang
basic-conditional-compare-support4.patch
Description: Binary data