MaxGraey wrote:
You can also try the branchless version:
```cpp
uint64_t NextPowerOf2_New_Branchless(uint64_t A) {
uint64_t Shift = Log2_64_Ceil(A + 1);
uint64_t Res = UINT64_C(1) << Shift;
return Res & -!(Shift >> 6);
}
```
https://godbolt.org/z/1Pe3Kz5Mc
For clang it should be even more optimal:
```asm
mov edx, 127
bsr rdx, rdi
xor rdx, 63
mov ecx, edx
neg cl
mov eax, 1
shl rax, cl
test rdx, rdx
cmove rax, rdx
ret
```
https://github.com/llvm/llvm-project/pull/189160
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits