Hi,

thank you for looking into it.

On Fri, Sep 06, 2019 at 12:13:34PM +0000, Wilco Dijkstra wrote:
> Hi,
> 
> +(simplify
> +  (convert
> +    (rshift
> +      (mult
> 
> > is the outer convert really necessary?  That is, if we change
> > the simplification result to
> 
> Indeed that should be "convert?" to make it optional.
> 

I removed this one as Richard suggested in the new patch version.

> > Is the Hamming weight popcount
> > faster than the libgcc table-based approach?  I wonder if we really
> > need to restrict this conversion to the case where the target
> > has an expander.
> 
> Well libgcc uses the exact same sequence (not a table):
> 
> objdump -d ./aarch64-unknown-linux-gnu/libgcc/_popcountsi2.o
> 
> 0000000000000000 <__popcountdi2>:
>    0: d341fc01        lsr     x1, x0, #1
>    4: b200c3e3        mov     x3, #0x101010101010101          // 
> #72340172838076673
>    8: 9200f021        and     x1, x1, #0x5555555555555555
>    c: cb010001        sub     x1, x0, x1
>   10: 9200e422        and     x2, x1, #0x3333333333333333
>   14: d342fc21        lsr     x1, x1, #2
>   18: 9200e421        and     x1, x1, #0x3333333333333333
>   1c: 8b010041        add     x1, x2, x1
>   20: 8b411021        add     x1, x1, x1, lsr #4
>   24: 9200cc20        and     x0, x1, #0xf0f0f0f0f0f0f0f
>   28: 9b037c00        mul     x0, x0, x3
>   2c: d378fc00        lsr     x0, x0, #56
>   30: d65f03c0        ret
> 
> So if you don't check for an expander you get an endless loop in libgcc since
> the makefile doesn't appear to use -fno-builtin anywhere...

The patch is designed to avoid such endless loop - libgcc popcount call is 
compiled into popcount cpu instruction(s) on supported platforms and the patch 
is only allowing simplification on such platforms. This is implemented via 
"optab_handler (popcount_optab, TYPE_MODE (argtype)) != CODE_FOR_nothing" check.

Thanks,
Dmitrij

> 
> Wilco
> 

Reply via email to