Re: Does dmd have SSE intrinsics?

Jeremie Pelletier Tue, 22 Sep 2009 08:05:12 -0700

Don wrote:

bearophile wrote:
Robert Jacques:
Yes, but the unaligned version is slower, even for aligned data.
This is true today, but in future it may become a little less true,thanks to improvements in the CPUs.
The problem is that difference today is so extreme. On core2:
 movaps [mem128], xmm0; // aligned,   1 micro-op
 movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data!
In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's stillslower if it's an unaligned access.
It all depends on how important you think performance on Core2 andearlier Intel processors is.

I wasn't aware of that, and here I was wondering why my SSE code wasslower than the FPU in certain places on my core2 quad, I now recallusing a lot of movups instructions, thanks for the tip.

Re: Does dmd have SSE intrinsics?

Reply via email to