[llvm-bugs] [Bug 47433] New: bad code for __builtin_parity on x86 and ARM

via llvm-bugs Sat, 05 Sep 2020 11:34:19 -0700

https://bugs.llvm.org/show_bug.cgi?id=47433


            Bug ID: 47433
           Summary: bad code for __builtin_parity on x86 and ARM
           Product: new-bugs
           Version: 10.0
          Hardware: Other
                OS: FreeBSD
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected]

I have found that __builtin_parity generates poor code on ARM (soft float).

    extern int foo(int x) {
        return __builtin_parity(x);
    }

compiles to

        mov     r1, #85
        and     r1, r1, r0, lsr #1
        sub     r0, r0, r1
        movw    r1, #13107
        movt    r1, #13107
        and     r2, r1, r0, lsr #2
        and     r0, r0, r1
        add     r0, r0, r2
        movw    r1, #3855
        movt    r1, #271
        add     r0, r0, r0, lsr #4
        and     r0, r0, r1
        movw    r1, #257
        movt    r1, #257
        mul     r0, r0, r1
        ubfx    r0, r0, #24, #1
        bx      lr

which essentially performs a population count and then truncates to one bit. 
This seems awfully suboptimal.  Why not a series of shifts and xors?

        eor r0, r0, r0, lsr #16
        eor r0, r0, r0, lsr #8
        eor r0, r0, r0, lsr #4
        eor r0, r0, r0, lsr #2
        eor r0, r0, r0, lsr #1
        and r0, r0, #1

This seems a lot better.  The builtin could also recognise if the input has a
width of less than 32 bit and perform less reductions if possible.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 47433] New: bad code for __builtin_parity on x86 and ARM

Reply via email to