https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119468
--- Comment #2 from Jens Seifert <jens.seifert at de dot ibm.com> ---
popcnt + parity is slower than just
64-bit popcount and extracting last bit.
"missed-optimization" opportunity applies as well to big endian.
Optimal code:
popcntd 3, 3
clrldi 3, 3, 63
blr
current code:
popcntb 3,3
prtyd 3,3
extsw 3,3
blr
prtyd has longer latency than clrldi.
