Re: A little Py Vs C++

Kapps Fri, 02 Nov 2012 14:35:24 -0700

On Friday, 2 November 2012 at 14:22:34 UTC, Jens Mueller wrote:

But the compiler knows about the alignment, doesn't it?
align(16) float[4] a;
vs
float[4] a;
In the former case the compiler can generate better code and itshould.The above syntax is not supported. But my point is all thecompilercares about is the alignment which can be specified in the codesomehow.
Sorry for being stubborn.

Jens

Note: My knowledge of SIMD/SSE is fairly limited, and may besomewhat out of date. In other words, some of this may be flatout wrong.

First, just because you have something that can have SIMDoperations performed on it, doesn't mean you necessarily want to.SSE instructions for example have to store things in the XMMregisters, and accessing the actual values of individual elementsin the vector is expensive. When using SSE, you want to avoidaccessing individual elements as much as possible. Not followingthis tends to hurt performance quite badly. Yet when you justhave a float[4], you may or may not be frequently or infrequentlyaccessing individual elements. The compiler can't know whetheryou use it as a single SIMD vector more often, or use it tosimply store 4 elements more often. You could be aligning it forany reason, so it's not too fair a way of determining it.

Secondly, you can't really know which SIMD instructions aresupported by your target CPU. It's safe to say SSE2 is supportedfor pretty much all x86 CPUs at this point, but something likeSSE4.2 instructions may not be. Just because the compiler knowsthat the CPU compiling it supports it doesn't mean that the CPUrunning the program will have those instructions.

Lastly, we'd still need SIMD intrinsics. It may be simple to tellthat a float[4] + float[4] operation could use addps, but itwould be more difficult to determine when to use something likedotps (dot product across two SIMD vectors), and various otherinstructions. Not to mention, non-x86 architectures.

Re: A little Py Vs C++

Reply via email to