Re: value range propagation for _bitwise_ OR

Don Tue, 13 Apr 2010 09:20:13 -0700

Don wrote:

Adam D. Ruppe wrote:
On Tue, Apr 13, 2010 at 11:10:24AM -0400, Clemens wrote:
That's strange. Looking at src/backend/cod4.c, function cdbscan, inthe dmd sources, bsr seems to be implemented in terms of the bsropcode [1] (which I guess is the reason it's an intrinsic in thefirst place). I would have expected this to be much, much faster thana user function. Anyone care enough to check the generated assembly?
The opcode is fairly slow anyway (as far as opcodes go) - odds are the
implementation inside the processor is similar to Jerome's method, and
the main savings come from it loading fewer bytes into the pipeline.

I remember a line from a blog, IIRC it was the author of the C++ FQA
writing it, saying hardware and software are pretty much the same thing -
moving an instruction to hardware doesn't mean it will be any faster,
since it is the same algorithm, just done in processor microcodeinstead of
user opcodes.
It's fast on Intel, slow on AMD. I bet the speed difference comes frominlining max().

Specifically, bsr is 7 uops on AMD, 1 uop on Intel since the originalPentium. AMD's performance is shameful.

And bsr() is supported in the compiler; in fact DMC uses it extensively,which is why it's included in DMD!

Re: value range propagation for _bitwise_ OR

Reply via email to