Hi, thank you for looking into it.
On Fri, Sep 06, 2019 at 12:13:34PM +0000, Wilco Dijkstra wrote: > Hi, > > +(simplify > + (convert > + (rshift > + (mult > > > is the outer convert really necessary? That is, if we change > > the simplification result to > > Indeed that should be "convert?" to make it optional. > I removed this one as Richard suggested in the new patch version. > > Is the Hamming weight popcount > > faster than the libgcc table-based approach? I wonder if we really > > need to restrict this conversion to the case where the target > > has an expander. > > Well libgcc uses the exact same sequence (not a table): > > objdump -d ./aarch64-unknown-linux-gnu/libgcc/_popcountsi2.o > > 0000000000000000 <__popcountdi2>: > 0: d341fc01 lsr x1, x0, #1 > 4: b200c3e3 mov x3, #0x101010101010101 // > #72340172838076673 > 8: 9200f021 and x1, x1, #0x5555555555555555 > c: cb010001 sub x1, x0, x1 > 10: 9200e422 and x2, x1, #0x3333333333333333 > 14: d342fc21 lsr x1, x1, #2 > 18: 9200e421 and x1, x1, #0x3333333333333333 > 1c: 8b010041 add x1, x2, x1 > 20: 8b411021 add x1, x1, x1, lsr #4 > 24: 9200cc20 and x0, x1, #0xf0f0f0f0f0f0f0f > 28: 9b037c00 mul x0, x0, x3 > 2c: d378fc00 lsr x0, x0, #56 > 30: d65f03c0 ret > > So if you don't check for an expander you get an endless loop in libgcc since > the makefile doesn't appear to use -fno-builtin anywhere... The patch is designed to avoid such endless loop - libgcc popcount call is compiled into popcount cpu instruction(s) on supported platforms and the patch is only allowing simplification on such platforms. This is implemented via "optab_handler (popcount_optab, TYPE_MODE (argtype)) != CODE_FOR_nothing" check. Thanks, Dmitrij > > Wilco >