> -----Original Message-----
> From: Richard Earnshaw
> Sent: Thursday, September 19, 2013 5:13 PM
> To: Zhenqiang Chen
> Cc: 'Richard Biener'; GCC Patches
> Subject: Re: [PATCH 1/n] Add conditional compare support
> 
> On 18/09/13 10:45, Zhenqiang Chen wrote:
> >
> >> -----Original Message-----
> >> From: Richard Biener [mailto:richard.guent...@gmail.com]
> >> Sent: Tuesday, August 27, 2013 8:18 PM
> >> To: Richard Earnshaw
> >> Cc: Zhenqiang Chen; GCC Patches
> >> Subject: Re: [PATCH 1/n] Add conditional compare support
> >>
> >> On Tue, Aug 27, 2013 at 1:56 PM, Richard Earnshaw <rearn...@arm.com>
> >> wrote:
> >>> On 27/08/13 12:10, Richard Biener wrote:
> >>>> What's this for and what's the desired semantics? I don't like
> >>>> having extra tree codes for this.  Is this for a specific
> >>>> instruction set feature?
> >>>
> >>> The background is to support the conditional compare instructions in
> >>> ARM (more effectively) and AArch64 at all.
> >>>
> >>> The current method used in ARM is to expand into a series of
> >>> store-flag instructions and then hope that combine can optimize them
> >>> away (though that fails far too often, particularly when the first
> >>> instruction in the sequence is combined into another pattern).  To
> >>> make it work at all the compiler has to lie about the costs of
> >>> various store-flag type operations which overall risks producing
> >>> worse code and means we also have to support many more complex
> >>> multi-instruction patterns than is desirable.  I really don't want
> >>> to go down the same
> > route
> >> for AArch64.
> >>>
> >>> The idea behind all this is to capture potential conditional compare
> >>> operations early enough in the mid end that we can keep track of
> >>> them until RTL expand time and then to emit the correct logic on all
> >>> targets depending on what is the best thing for that target.  The
> >>> current method of lowering into store-flag sequences doesn't really
> >>> cut
> > it.
> >>
> >> It seems to me that then the initial instruction selection process
> >> (aka
> > RTL
> >> expansion) needs to be improved.  As we are expanding with having the
> >> CFG around it should be easy enough to detect AND/ORIF cases and do
> >> better here.  Yeah, I suppose this asks to turn existing jump
> >> expansion
> > optimizations
> >> up-side-down to optimize with the GIMPLE CFG in mind.
> >>
> >> The current way of LOGICAL_OP_NON_SHORT_CIRCUIT is certainly bogus
> -
> >> fold-const.c is way too early to decide this.  Similar to the ongoing
> >> work
> > of
> >> expanding / building-up switch expressions in a GIMPLE pass, moving
> >> expand complexity up the pipeline this asks for a GIMPLE phase that
> >> moves this decision down closer to RTL expansion.
> >> (We have tree-ssa-ifcombine.c that is a related GIMPLE transform
> >> pass)
> >>
> >
> > The patch is updated according to your comments. It is a basic
> > support, which does not touch ifcombine and jump related optimizations
> yet.
> >
> > Current method is:
> > 1) In fold-const, if HAVE_conditional_compare, set
> > LOGICAL_OP_NON_SHORT_CIRCUIT
> >    to optimize_function_for_speed_p. So we do not depend on
> BRANCH_COST.
> > 2) Identify CCMP during expanding. A CCMP candidate is a BIT_AND_EXPR
> >    or BIT_IOR_EXPR, whose operators are compares.
> > 3) Add a new op in optab to expand the CCMP to optimized RTL,
> >     e.g. and_scc_scc/ior_scc_scc in ARM.
> >
> > Bootstrap on ARM Chrome book.
> > No make check regression on Pandaboard.
> >
> > Thanks!
> > -Zhenqiang
> >
> > ChangeLog:
> > 2013-09-18  Zhenqiang Chen  <zhenqiang.c...@linaro.org>
> >
> >     * config/arm/arm.md (conditional_compare): New.
> >     * expr.c (is_conditional_compare_candidate_p, expand_ccmp_expr):
> > New.
> >     (expand_expr_real_1): Identify conditional compare.
> >     * fold-const.c (LOGICAL_OP_NON_SHORT_CIRCUIT): Update.
> >     * optabs.c (expand_ccmp_op): New.
> >     (get_rtx_code): Handle BIT_AND_EXPR and BIT_IOR_EXPR.
> >     * optabs.def (ccmp_optab): New.
> >     * optabs.h (expand_ccmp_op): New.
> >
> >
> > basic-conditional-compare-support2.patch
> >
> >
> > N¬n‡r¥ªíÂ)emçhÂyhiם¢w^™©Ý
> >
> 
> Some general comments.
> 
> 1) How do we get to a conditional branch from this code.  It seems your new
> pattern generates a store-flag value rather than a branch expansion.
>  Are you expecting combine or some other later pass to clean that up?

The patch still depend on combine pass to optimize the cmp to generate a 
"cmp_ior/cmp_and".

It is not necessary a branch. e.g. "return a && b;" is not a branch when 
expanding.

The attached patch is updated to distinguish the two cases. If the CCMP is used 
in a GIMPLE_COND statement, it generates instruction to match 
"cmp_and/cmp_ior". An new expand "cbranchcc4" is added to match the branch 
instruction.  Then, it does not depend on later passes to optimize the CCMP.

> 2) Assuming we do end up with branches, why would we not want to do this
> optimization when optimzing for space?

I mis-checked http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43920, which was only 
for THUMB mode. The attached patch removed it.
 
>       cmp     r0, r1
>       cmpne   r2, r3
>       beq     L1
> 
> is shorter than
> 
>       cmp     r0, r1
>       beq     L1
>       cmp     r2, r3
>       beq     L1

> 3) Is there any way to generalize this for more complex expressions?  Eg
> 
>       if (a == b || a == c || a == d)
> 
> should become
> 
>       cmp     a, b
>       cmpne   a, c
>       cmpne   a, d
>       ...
> 
> Obviously, when optimizing for speed there will probably be a limit on the
> number of compares that are desirable, but I don't see why we should
> arbitrarily cap it at 2.

Based on current gcc, there are "no more than 2" cmps. Need update front-end or 
ifcombine to recover it. I planned to enhance the ifcombine to identify more 
CCMP opportunity. When we have more than 2 cmps, it is not hard to support it 
based on current patch. E.g.

 (a == b || a == c || a == d)

will create two ccmp:

CC_DEQ = CCMP (ior (EQ (a, b), EQ (a, c)))
CC_DEQ = CCMP (ior (EQ (CC_DEQ, 0), EQ (a, d)))

The first CCMP will generate
cmp    a, b
cmpne a, c

The second CCMP will only generate
cmpne a, d
CC_DEQ is used for determine the condition (ne).

And from the MODE of CC_DEQ, we can distinguish the two CCMP.

BTW: 
How to determine the max number? It seams not necessary better with more 
conditional compare.

Thanks!
-Zhenqiang

Attachment: basic-conditional-compare-support4.patch
Description: Binary data

Reply via email to