On Friday, 2 November 2012 at 14:22:34 UTC, Jens Mueller wrote:
But the compiler knows about the alignment, doesn't it?

align(16) float[4] a;
vs
float[4] a;

In the former case the compiler can generate better code and it should. The above syntax is not supported. But my point is all the compiler cares about is the alignment which can be specified in the code somehow.
Sorry for being stubborn.

Jens

Note: My knowledge of SIMD/SSE is fairly limited, and may be somewhat out of date. In other words, some of this may be flat out wrong.

First, just because you have something that can have SIMD operations performed on it, doesn't mean you necessarily want to. SSE instructions for example have to store things in the XMM registers, and accessing the actual values of individual elements in the vector is expensive. When using SSE, you want to avoid accessing individual elements as much as possible. Not following this tends to hurt performance quite badly. Yet when you just have a float[4], you may or may not be frequently or infrequently accessing individual elements. The compiler can't know whether you use it as a single SIMD vector more often, or use it to simply store 4 elements more often. You could be aligning it for any reason, so it's not too fair a way of determining it.

Secondly, you can't really know which SIMD instructions are supported by your target CPU. It's safe to say SSE2 is supported for pretty much all x86 CPUs at this point, but something like SSE4.2 instructions may not be. Just because the compiler knows that the CPU compiling it supports it doesn't mean that the CPU running the program will have those instructions.

Lastly, we'd still need SIMD intrinsics. It may be simple to tell that a float[4] + float[4] operation could use addps, but it would be more difficult to determine when to use something like dotps (dot product across two SIMD vectors), and various other instructions. Not to mention, non-x86 architectures.

Reply via email to