Walter Bright wrote: > On 11/2/2012 3:50 AM, Jens Mueller wrote: > > Okay. For me they look the same. Can you elaborate, please? Assume I > > want to add two float vectors which is common in both games and > > scientific computing. The only difference is in games their length is > > usually 3 or 4 whereas in scientific computing they are of arbitrary > > length. Why do I need instrinsics to support the game setting? > > Another excellent question. > > Most languages have taken the "auto-vectorization" approach of > reverse engineering loops to turn them into high level constructs, > and then compiling the code into special SIMD instructions. > > How to do this is explained in detail in the (rare) book "The > Software Vectorization Handbook" by Bik, which I fortunately was > able to obtain a copy of. > > This struck me as a terrible approach, however. It just seemed > stupid to try to teach the compiler to reverse engineer low level > code into high level code. A better design would be to start with > high level code. Hence, the appearance of D vector operations. > > The trouble with D vector operations, however, is that they are too > general purpose. The SIMD instructions are very quirky, and it's > easy to unwittingly and silently cause the compiler to generate > absolutely terribly slow code. The reasons for that are the > alignment requirements, coupled with the SIMD instructions not being > orthogonal - some operations work for some types and not for others, > in a way that is unintuitive unless you're carefully reading the > SIMD specs. > > Just saying align(16) isn't good enough, as the vector ops work on > slices and those slices aren't always aligned. So each one has to > check alignment at runtime, which is murder on performance. > > If a particular vector op for a particular type has no SIMD support, > then the compiler has to generate workaround code. This can also > have terrible performance consequences. > > So the user writes vector code, benchmarks it, finds zero > improvement, and the reasons why will be elusive to anyone but an > expert SIMD programmer. > > (Auto-vectorizing technology has similar issues, pretty much meaning > you won't get fast code out of it unless you've got a habit of > examining the assembler output and tweaking as necessary.) > > Enter Manu, who has a lot of experience making SIMD work for games. > His proposal was: > > 1. Have native SIMD types. This will guarantee alignment, and will > guarantee a compile time error for SIMD types that are not supported > by the CPU. > > 2. Have the compiler issue an error for SIMD operations that are not > supported by the CPU, rather than silently generating inefficient > workaround code. > > 3. There are all kinds of weird but highly useful SIMD instructions > that don't have a straightforward representation in high level code, > such as saturated arithmetic. Manu's answer was to expose these > instructions via intrinsics, so the user can string them together, > be sure that they will generate real SIMD instructions, while the > compiler can deal with register allocation. > > This approach works, is inlineable, generates code as good as > hand-built assembler, and is useable by regular programmers. > > I won't say there aren't better approaches, but this one we know works.
I see. Thanks for clarifying. If I want fast vector operations I have to use core.simd. The built-in vector operations won't fit the bill. I was of the opinion that a vector operation in D should (at some point) generate vectorized code. Jens