Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

Bruce Carneal via Digitalmars-d-learn Sun, 07 Mar 2021 12:05:56 -0800

On Sunday, 7 March 2021 at 14:15:58 UTC, z wrote:

On Thursday, 25 February 2021 at 14:28:40 UTC, Guillaume Piolatwrote:
On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote:
How does one optimize code to make full use of the CPU's SIMDcapabilities?Is there any way to guarantee that "packed" versions of SIMDinstructions will be used?(e.g. vmulps, vsqrtps, etc...)
https://code.dlang.org/packages/intel-intrinsics
I'd try to use it but the platform i'm building on requires AVXto get the most performance.


The code below might be worth a try on your AVX512 machine.

Unless you're looking for a combined result, you might need toseparate out the memory access overhead by running multiplepasses over a "known optimal for L2" data set.

Also note that I compiled with -preview=in. I don't know if thatmatters.




import std.math : sqrt;
enum SIMDBits = 512; // 256 was tested, 512 was not
alias A = float[SIMDBits / (float.sizeof * 8)];
pragma(inline, true)

void soaEuclidean(ref A a0, in A a1, in A a2, in A a3, in Ab1, in A b2, in A b3)

{
    alias V = __vector(A);
    static V vsqrt(V v)
    {
        A a = cast(A) v;
        static foreach (i; 0 .. A.length)
            a[i] = sqrt(a[i]);
        return cast(V)a;
    }

    static V sd(in A a, in A b)
    {
        V v = cast(V) b - cast(V) a;
        return v * v;
    }

    auto v = sd(a1, b1) + sd(a2, b2) + sd(a3, b3);
    a0[] = vsqrt(v)[];
}

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

Reply via email to