Re: Does dmd have SSE intrinsics?

Jeremie Pelletier Tue, 22 Sep 2009 09:10:12 -0700

#ponce wrote:

In practice it's about an 8X speed difference!
On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's stillslower if it's an unaligned access.
It all depends on how important you think performance on Core2 andearlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code wasslower than the FPU in certain places on my core2 quad, I now recallusing a lot of movups instructions, thanks for the tip.
Indeed SSE is known to be overkill when dealing with unaligned data.
In C++ writing SSE code is so painful you either have to use intrisics, or use 
libraries like Eigen (a SIMD vectorization library based on expression 
templates, which can generate SSE, AVX or FPU code). But using such a library 
is often way too intrusive, and alignement is not in standard C++.

D does already understand arrays operations like Eigen do, in order to increase 
cacheability. It would be great if it could statically detect 16-byte aligned 
data and perform SSE when possible (though there must be many others things to 
do :) ).

The D memory manager already aligns data on 16 bytes boundaries. Theonly case I can think of right now is when data is in a struct or class:


struct {
        float[4] vec; // aligned!
        int a;
        float[4] vec; // unaligned!
}

Re: Does dmd have SSE intrinsics?

Reply via email to