Re: A little Py Vs C++

Walter Bright Fri, 02 Nov 2012 11:05:23 -0700

On 11/2/2012 3:50 AM, Jens Mueller wrote:
> Okay. For me they look the same. Can you elaborate, please? Assume I
> want to add two float vectors which is common in both games and
> scientific computing. The only difference is in games their length is
> usually 3 or 4 whereas in scientific computing they are of arbitrary
> length. Why do I need instrinsics to support the game setting?


Another excellent question.

Most languages have taken the "auto-vectorization" approach of reverseengineering loops to turn them into high level constructs, and then compilingthe code into special SIMD instructions.

How to do this is explained in detail in the (rare) book "The SoftwareVectorization Handbook" by Bik, which I fortunately was able to obtain a copy of.

This struck me as a terrible approach, however. It just seemed stupid to try toteach the compiler to reverse engineer low level code into high level code. Abetter design would be to start with high level code. Hence, the appearance of Dvector operations.

The trouble with D vector operations, however, is that they are too generalpurpose. The SIMD instructions are very quirky, and it's easy to unwittingly andsilently cause the compiler to generate absolutely terribly slow code. Thereasons for that are the alignment requirements, coupled with the SIMDinstructions not being orthogonal - some operations work for some types and notfor others, in a way that is unintuitive unless you're carefully reading theSIMD specs.

Just saying align(16) isn't good enough, as the vector ops work on slices andthose slices aren't always aligned. So each one has to check alignment atruntime, which is murder on performance.

If a particular vector op for a particular type has no SIMD support, then thecompiler has to generate workaround code. This can also have terribleperformance consequences.

So the user writes vector code, benchmarks it, finds zero improvement, and thereasons why will be elusive to anyone but an expert SIMD programmer.

(Auto-vectorizing technology has similar issues, pretty much meaning you won'tget fast code out of it unless you've got a habit of examining the assembleroutput and tweaking as necessary.)

Enter Manu, who has a lot of experience making SIMD work for games. His proposalwas:

1. Have native SIMD types. This will guarantee alignment, and will guarantee acompile time error for SIMD types that are not supported by the CPU.

2. Have the compiler issue an error for SIMD operations that are not supportedby the CPU, rather than silently generating inefficient workaround code.

3. There are all kinds of weird but highly useful SIMD instructions that don'thave a straightforward representation in high level code, such as saturatedarithmetic. Manu's answer was to expose these instructions via intrinsics, sothe user can string them together, be sure that they will generate real SIMDinstructions, while the compiler can deal with register allocation.

This approach works, is inlineable, generates code as good as hand-builtassembler, and is useable by regular programmers.


I won't say there aren't better approaches, but this one we know works.

Re: A little Py Vs C++

Reply via email to