Adam Ruppe Wrote:

> Jerome's highbit function is the same as std.intrinsic.bsr. I wonder
> which is faster?
> 
> I ran a test, and for 100 million iterations (1..10000000), the
> intrinsic beat out his function be a mere 300 milliseconds on my box.
> - highbit ran in an average of 1897 ms and bsr did the same in an
> average if 1534.
> 
> Recompiling with -inline -O -release cuts the raw numbers about in
> half, but keeps about the same difference, leading me to think
> overhead amounts for a fair amount of the percentage instead of actual
> implementation. The new averages are 1134 and 853.

That's strange. Looking at src/backend/cod4.c, function cdbscan, in the dmd 
sources, bsr seems to be implemented in terms of the bsr opcode [1] (which I 
guess is the reason it's an intrinsic in the first place). I would have 
expected this to be much, much faster than a user function. Anyone care enough 
to check the generated assembly?

[1] http://www.itis.mn.it/linux/quarta/x86/bsr.htm

-- Clemens

Reply via email to