On 2 November 2012 20:02, Walter Bright <newshou...@digitalmars.com> wrote:

> On 11/2/2012 3:50 AM, Jens Mueller wrote:
> > Okay. For me they look the same. Can you elaborate, please? Assume I
> > want to add two float vectors which is common in both games and
> > scientific computing. The only difference is in games their length is
> > usually 3 or 4 whereas in scientific computing they are of arbitrary
> > length. Why do I need instrinsics to support the game setting?
>
> Another excellent question.
>
> Most languages have taken the "auto-vectorization" approach of reverse
> engineering loops to turn them into high level constructs, and then
> compiling the code into special SIMD instructions.
>
> How to do this is explained in detail in the (rare) book "The Software
> Vectorization Handbook" by Bik, which I fortunately was able to obtain a
> copy of.
>
> This struck me as a terrible approach, however. It just seemed stupid to
> try to teach the compiler to reverse engineer low level code into high
> level code. A better design would be to start with high level code. Hence,
> the appearance of D vector operations.
>
> The trouble with D vector operations, however, is that they are too
> general purpose. The SIMD instructions are very quirky, and it's easy to
> unwittingly and silently cause the compiler to generate absolutely terribly
> slow code. The reasons for that are the alignment requirements, coupled
> with the SIMD instructions not being orthogonal - some operations work for
> some types and not for others, in a way that is unintuitive unless you're
> carefully reading the SIMD specs.
>
> Just saying align(16) isn't good enough, as the vector ops work on slices
> and those slices aren't always aligned. So each one has to check alignment
> at runtime, which is murder on performance.
>
> If a particular vector op for a particular type has no SIMD support, then
> the compiler has to generate workaround code. This can also have terrible
> performance consequences.
>
> So the user writes vector code, benchmarks it, finds zero improvement, and
> the reasons why will be elusive to anyone but an expert SIMD programmer.
>
> (Auto-vectorizing technology has similar issues, pretty much meaning you
> won't get fast code out of it unless you've got a habit of examining the
> assembler output and tweaking as necessary.)
>
> Enter Manu, who has a lot of experience making SIMD work for games. His
> proposal was:
>
> 1. Have native SIMD types. This will guarantee alignment, and will
> guarantee a compile time error for SIMD types that are not supported by the
> CPU.
>
> 2. Have the compiler issue an error for SIMD operations that are not
> supported by the CPU, rather than silently generating inefficient
> workaround code.
>
> 3. There are all kinds of weird but highly useful SIMD instructions that
> don't have a straightforward representation in high level code, such as
> saturated arithmetic. Manu's answer was to expose these instructions via
> intrinsics, so the user can string them together, be sure that they will
> generate real SIMD instructions, while the compiler can deal with register
> allocation.
>

Well, I wouldn't claim any credit for the approach ;) .. I think this is
the standard for maximum performance, and also very well understood.
But the thing that excites me most is the potential quality of libraries
that can be built on top. D has so much potential to extend on this SIMD
foundation with it's templates being able to intelligently handle far more
context specific situations.
What we do already in other languages will be far more convenient, more
portable, and possibly even produce better code in D. And the biggest
bonus, it will be readable! :)

I think it's a low risk investment, and it doesn't prohibit higher level
support in the future.


This approach works, is inlineable, generates code as good as hand-built
> assembler, and is useable by regular programmers.
>
> I won't say there aren't better approaches, but this one we know works.
>

Aye, and it's relatively un-intrusive too. Some new types and a few
intrinsics, build useful libraries on top. It shouldn't have complex side
effects, and if offers something that was sorely missing from the language
today.

Reply via email to