Re: SIMD on Windows

jerro Sat, 29 Jun 2013 11:35:47 -0700

On Saturday, 29 June 2013 at 17:57:20 UTC, Jonathan Dunlap wrote:

I've updated the project with your suggestions athttp://dpaste.dzfl.pl/fce2d93b but still get the sameperformance. Vectors defined in the benchmark function body, nofunction calling overhead, etc. See some of my comments belowbtw:
First of all, calcSIMD and calcScalar are virtual functions sothey can't be inlined, which prevents any further optimization.
For the dlang docs: Member functions which are private orpackage are never virtual, and hence cannot be overridden.
So my guess is that the first four multiplications and thesecond four multiplications in calcScalar are done inparallel. ... The reason it's faster is that gdc replacesmultiplication by 2 with addition and omits multiplication by1.
I've changed the multiplies of 2 and 1 to 2.1 and 1.01respectively. Still no performance difference between the twofor me.

The multiples 2 and 1 were the reason why the scalar codeperforms a little bit better than SIMD code when compiled withGDC. The main reason why scalar code isn't much slower than SIMDcode is instruction level parallelism. Because the first fouroperation in calcScalar are independent (none of them depends onthe result of any of the other three) modern x86-64 processorscan execute them in parallel. Because of that, the speed of yourprogram is limited by instruction latency and not throughput.That's why it doesn't really make a difference that the scalarversion does four times as many operations.

You can also make advantage of instruction level parallelism whenusing SIMD. For example, I get about the same number ofiterations per second for the following two functions (when usingGDC):


        import gcc.attribute;

        @attribute("forceinline") void calcSIMD1() {

                s0 = s0 * i0;

                s0 = s0 * d0;

                s1 = s1 * i1;

                s1 = s1 * d1;

                s2 = s2 * i2;

                s2 = s2 * d2;

                s3 = s3 * i3;

                s3 = s3 * d3;

        }

        @attribute("forceinline") void calcSIMD2() {

                s0 = s0 * i0;

                s0 = s0 * d0;
        }

By the way, if performance is very important to you, you shouldtry GDC (or LDC, but I don't think LDC is currently fully usableon Windows).

Re: SIMD on Windows

Reply via email to