--- .c ---
int ispowerof2(unsigned long long argument) {
return __builtin_popcountll(argument) == 1;
}
--- EOF ---
GCC 13.3 gcc -m32 -march=alderlake -O3
gcc -m32 -march=sapphirerapids -O3
gcc -m32 -mpopcnt -mtune=sapphirerapids -O3
https://gcc.godbolt.org/z/cToYrrYPq
ispowerof2(unsigned long long):
xor eax, eax # superfluous
xor edx, edx # superfluous
popcnt eax, [esp+4]
popcnt edx, [esp+8]
add eax, edx
cmp eax, 1 -> dec eax
sete al
movzx eax, al # superfluous
ret
9 instructions in 28 bytes # 6 instructions in 20 bytes
OUCH: popcnt writes the WHOLE result register, there is ABSOLUTELY
no need to clear it beforehand nor to clear the higher 24 bits
afterwards!
JFTR: before GCC zealots write nonsense: see -march= or -mtune=
GCC 13.3 gcc -mpopcnt -mtune=barcelona -O3
https://gcc.godbolt.org/z/3Ks8vh7a6
ispowerof2(unsigned long long):
popcnt rdi, rdi -> popcnt rax, rdi
xor eax, eax # superfluous!
dec edi -> dec eax
sete al -> setz al
ret
GCC 13.3 gcc -m32 -mpopcnt -mtune=barcelona -O3
https://gcc.godbolt.org/z/s5s5KTGnv
ispowerof2(unsigned long long):
popcnt eax, [esp+4]
popcnt edx, [esp+8]
add eax, edx
dec eax
sete al
movzx eax, al # superfluous!
ret
Will GCC eventually generate properly optimised code instead of bloat?
Stefan