Hi Jeff,

we are aware that the funciton is essentially an integer log2.
The chosen C-based variant is acually faster and more general than
what you have included (it needs only max 2 shift operations for
the relevant range) but the assembler based variant is hard to beat
and yields another 3% for the performance of the benchmark
on top of the fastest C version. Thanks for that!

-gustaf

Jeff Rogers schrieb:
I don't think anyone has pointed this out yet, but this is a logarithm
in base 2 (log2), and there are a fair number of implementations of this available; for maximum performance there are assembly implementations using 'bsr' on x86 architectures, such as this one from google's tcmalloc:



Reply via email to