I've just noticed that x86 implementations (32-bit and 64-bit) of this function 
are suboptimal: they use looping to count the bits, which is very slow. X86 has 
the dedicated BSF (bit scan forward) instruction, which executes in 3-4 cycles. 
 The difference is very significant and measurable in certain algorithms that 
depend on this particular operation (e.g. van Emde-Boas tree).

I am willing to contribute an assembly-optimized version for the x86, if there 
is any interest.
 
 
This message posted from opensolaris.org
_______________________________________________
opensolaris-code mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code

Reply via email to