Ping.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02080.html

Thanks,
Kyrill

On 26/05/16 10:53, Kyrill Tkachov wrote:
Hi all,

In this PR we want to optimise:
int foo (int i)
{
  return (i == 0) ? N : __builtin_clz (i);
}

on targets where CLZ is defined at zero to the constant 'N'.
This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO macro.
The obvious place to implement this would be in combine through simplify-rtx 
where we'd
recognise an IF_THEN_ELSE of the form:
(set (reg:SI r1)
     (if_then_else:SI (ne (reg:SI r2)
                          (const_int 0 [0]))
       (clz:SI (reg:SI r2))
       (const_int 32)))

and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify it 
into
just (clz:SI (reg:SI r2)).
However, I found this doesn't quite happen for a couple of reasons:
1) This depends on ifcvt or some other pass to have created a conditional move 
of the
two branches that provide the IF_THEN_ELSE to propagate the const_int and clz 
operation into.

2) Combine will refuse to propagate r2 from the above example into both the 
condition and the
CLZ at the same time, so the most we see is:
(set (reg:SI r1)
     (if_then_else:SI (ne (reg:CC cc)
            (const_int 0))
       (clz:SI (reg:SI r2))
       (const_int 32)))

which is not enough information to perform the simplification.

This patch implements the optimisation in ce1 using the noce ifcvt framework.
During ifcvt noce_process_if_block can see that we're trying to optimise 
something
of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all the 
information
needed to perform the transformation.

The transformation is performed by adding a new noce_try* function that tries 
to put the
condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to 
simplify that
using the simplify-rtx machinery. That way, we can implement the simplification 
logic in
simplify-rtx.c where it belongs.

A similar transformation for CTZ is implemented as well.
So for code:
int foo (int i)
{
  return (i == 0) ? 32 : __builtin_clz (i);
}

On aarch64 we now emit:
foo:
        clz     w0, w0
        ret

instead of:
foo:
        mov     w1, 32
        clz     w2, w0
        cmp     w0, 0
        csel    w0, w2, w1, ne
        ret

and for arm similarly we generate:
foo:
        clz     r0, r0
        bx      lr

instead of:
foo:
        cmp     r0, #0
        clzne   r0, r0
        moveq   r0, #32
        bx      lr


and for x86_64 with -O2 -mlzcnt we generate:
foo:
        xorl    %eax, %eax
        lzcntl  %edi, %eax
        ret

instead of:
foo:
        xorl    %eax, %eax
        movl    $32, %edx
        lzcntl  %edi, %eax
        testl   %edi, %edi
        cmove   %edx, %eax
        ret


I tried getting this to work on other targets as well, but encountered 
difficulties.
For example on powerpc the two arms of the condition seen during ifcvt are:

(insn 4 22 11 4 (set (reg:DI 156 [ <retval> ])
        (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64}
     (nil))
and
(insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0)
        (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2}
     (expr_list:REG_DEAD (reg/v:DI 157 [ i ])
        (nil)))

So the setup code in noce_process_if_block sees that the set destination is not 
the same
((reg:DI 156 [ <retval> ]) and (subreg/s/u:SI (reg:DI 156 [ <retval> ]) 0))
so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check.
I suppose that's a consequence of how SImode operations are represented in 
early RTL
on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be 
taught to handle
destinations that are subregs of one another, but that would be a separate 
patch.

Anyway, is this patch ok for trunk?

Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, 
x86_64-pc-linux-gnu.

Thanks,
Kyrill

2016-05-26  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR middle-end/37780
    * ifcvt.c (noce_try_ifelse_collapse): New function.
    Declare prototype.
    (noce_process_if_block): Call noce_try_ifelse_collapse.
    * simplify-rtx.c (simplify_cond_clz_ctz): New function.
    (simplify_ternary_operation): Use the above to simplify
    conditional CLZ/CTZ expressions.

2016-05-26  Kyrylo Tkachov  <kyrylo.tkac...@arm.com>

    PR middle-end/37780
    * gcc.c-torture/execute/pr37780.c: New test.
    * gcc.target/aarch64/pr37780_1.c: Likewise.
    * gcc.target/arm/pr37780_1.c: Likewise.

Reply via email to