Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

tsbockman via Digitalmars-d-learn Thu, 25 Feb 2021 20:00:33 -0800

On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote:

float euclideanDistanceFixedSizeArray(float[3] a, float[3] b) {


Use __vector(float[4]), not float[3].

  float distance;

The default value for float is float.nan. You need to explicitlyinitialize it to 0.0f or something if you want this function toactually do anything useful.

  a[] -= b[];
  a[] *= a[];

With __vector types, this can be simplified (not optimized) tojust:

    a -= b;
    a *= a;

  static foreach(size_t i; 0 .. 3/+typeof(a).length+/){
      distance += a[i].abs;//abs required by the caller

(a * a) above is always positive for real numbers. You don't needthe call to abs unless you're trying to guarantee that even nanvalues will have a clear sign bit.

Also, there is no point to adding the first component to zero,and copying element [0] from a SIMD register into a scalar isfree, so this can become:


    float distance = a[0];
    static foreach(size_t i; 1 .. 3)
        distance += a[i];

  }
  return sqrt(distance);
}

Final assembly output (ldc 1.24.0 with -release -O3-preview=intpromote -preview=dip1000 -m64 -mcpu=haswell-fp-contract=fast -enable-cross-module-inlining):


    vsubps  xmm0, xmm1, xmm0
    vmulps  xmm0, xmm0, xmm0
    vmovshdup       xmm1, xmm0
    vaddss  xmm1, xmm0, xmm1
    vpermilpd       xmm0, xmm0, 1
    vaddss  xmm0, xmm0, xmm1
    vsqrtss xmm0, xmm0, xmm0
    ret

Re: Optimizing for SIMD: best practices?(i.e. what features are allowed?)

Reply via email to