Re: Rather Bizarre slow downs using Complex!float with avx (ldc).

james.p.leblanc via Digitalmars-d-learn Fri, 01 Oct 2021 01:38:01 -0700

On Thursday, 30 September 2021 at 16:52:57 UTC, Johan wrote:

On Thursday, 30 September 2021 at 16:40:03 UTC, james.p.leblanc

Generally, for performance issues like this you need to studyassembly output (`--output-s`) or LLVM IR (`--output-ll`).
First thing I would look out for is function inlining yes/no.

cheers,
  Johan


Johan,

Thanks kindly for your reply. As suggested, I have looked at theassembly output.

Strangely the fused multiplay add are indeed there in the avxversion, but example

still runs slower for **Complex!float** data type.

I have stripped the code down to a minimum, which demonstratesthe weird result:




```d

import ldc.attributes; // with or without this line makes nodifference

import std.stdio;
import std.datetime.stopwatch;
import std.complex;

alias T = Complex!float;
auto typestr = "COMPLEX FLOAT";
/* alias T = Complex!double; */
/* auto typestr = "COMPLEX DOUBLE"; */

auto alpha = cast(T) complex(0.1, -0.2); // dummy values to fillarrays

auto beta = cast(T) complex(-0.7, 0.6);

auto dotprod( T[] x, T[] y)
{
   auto sum = cast(T) 0;
      foreach( size_t i ; 0 .. x.length)
         sum += x[i] * conj(y[i]);
   return sum;
}

void main()
{
   int nEle = 1000;
   int nIter = 2000;

   auto startTime = MonoTime.currTime;

auto dur = cast(double)(MonoTime.currTime-startTime).total!"usecs";


   T[] x, y;
   x.length = nEle;
   y.length = nEle;
   T z;
   x[] = alpha;
   y[] = beta;

   startTime = MonoTime.currTime;
   foreach( i ; 0 .. nIter){
      foreach( j ; 0 .. nIter){
            z = dotprod(x,y);
      }
   }

auto etime = cast(double)(MonoTime.currTime-startTime).total!"msecs" / 1.0e3;writef(" result: % 5.2f%+5.2fi comp time: %5.2f \n", z.re,z.im, etime);

}
```

For convenience I include bash script used compile/run/generateassembly code / and grep:


```bash
echo
echo "With AVX:"
ldc2 -O3 -release question.d --ffast-math -mcpu=haswell
question
ldc2 -output-s -O3 -release question.d --ffast-math -mcpu=haswell
mv question.s question_with_avx.s

echo
echo "Without AVX"
ldc2 -O3 -release question.d
question
ldc2 -output-s -O3 -release question.d
mv question.s question_without_avx.s

echo
echo "fused multiply adds are found in avx code (as desired)"
grep vfmadd *.s /dev/null
```

Here is output when run on my machine:

```console
With AVX:
 result:  -190.00+80.00i  comp time:   6.45

Without AVX
 result:  -190.00+80.00i  comp time:   5.74

fused multiply adds are found in avx code (as desired)
question_with_avx.s:    vfmadd231ss     %xmm2, %xmm5, %xmm3
question_with_avx.s:    vfmadd231ss     %xmm0, %xmm2, %xmm3
question_with_avx.s:    vfmadd231ss     %xmm2, %xmm4, %xmm1
question_with_avx.s:    vfmadd231ss     %xmm3, %xmm5, %xmm1
question_with_avx.s:    vfmadd231ss     %xmm3, %xmm1, %xmm0

```

Repeating the experiment after changing to datatype ofComplex!doubleshows AVX code to be twice as fast (perhaps more aligned withexpectations).

**I admit my confusion as to why the Complex!float ismisbehaving.**


Does anyone have insight to what is happening?

Thanks,
James

Re: Rather Bizarre slow downs using Complex!float with avx (ldc).

Reply via email to