On Thursday, 25 February 2021 at 11:28:14 UTC, z wrote:
float euclideanDistanceFixedSizeArray(float[3] a, float[3] b) {
Use __vector(float[4]), not float[3].
float distance;
The default value for float is float.nan. You need to explicitly
initialize it to 0.0f or something if you want this function to
actually do anything useful.
a[] -= b[];
a[] *= a[];
With __vector types, this can be simplified (not optimized) to
just:
a -= b;
a *= a;
static foreach(size_t i; 0 .. 3/+typeof(a).length+/){
distance += a[i].abs;//abs required by the caller
(a * a) above is always positive for real numbers. You don't need
the call to abs unless you're trying to guarantee that even nan
values will have a clear sign bit.
Also, there is no point to adding the first component to zero,
and copying element [0] from a SIMD register into a scalar is
free, so this can become:
float distance = a[0];
static foreach(size_t i; 1 .. 3)
distance += a[i];
}
return sqrt(distance);
}
Final assembly output (ldc 1.24.0 with -release -O3
-preview=intpromote -preview=dip1000 -m64 -mcpu=haswell
-fp-contract=fast -enable-cross-module-inlining):
vsubps xmm0, xmm1, xmm0
vmulps xmm0, xmm0, xmm0
vmovshdup xmm1, xmm0
vaddss xmm1, xmm0, xmm1
vpermilpd xmm0, xmm0, 1
vaddss xmm0, xmm0, xmm1
vsqrtss xmm0, xmm0, xmm0
ret