On 23 March 2012 18:57, Comrad <comrad.karlov...@googlemail.com> wrote: > On Thursday, 22 March 2012 at 10:43:35 UTC, Trass3r wrote: >>> >>> What is the status at the moment? What compiler and with which compiler >>> flags I should use to achieve maximum performance? >> >> >> In general gdc or ldc. Not sure how good vectorization is though, esp. >> auto-vectorization. >> On the other hand the so called vector operations like a[] = b[] + c[]; >> are lowered to hand-written SSE assembly even in dmd. > > > I had such a snippet to test: > > 1 import std.stdio; > 2 void main() > 3 { > 4 double[2] a=[1.,0.]; > 5 double[2] a1=[1.,0.]; > 6 double[2] a2=[1.,0.]; > 7 double[2] a3=[0.,0.]; > 8 foreach(i;0..1000000000) > 9 a3[]+=a[]+a1[]*a2[]; > 10 writeln(a3); > 11 } > > And I compared with the following d code: > > 1 import std.stdio; > 2 void main() > 3 { > 4 double[2] a=[1.,0.]; > 5 double[2] a1=[1.,0.]; > 6 double[2] a2=[1.,0.]; > 7 double[2] a3=[0.,0.]; > 8 foreach(i;0..1000000000) > 9 { > 10 a3[0]+=a[0]+a1[0]*a2[0]; > 11 a3[1]+=a[1]+a1[1]*a2[1]; > 12 } > 13 writeln(a3); > 14 } > > And with the following c code: > > 1 #include <stdio.h> > 2 int main() > 3 { > 4 double a[2]={1.,0.}; > 5 double a1[2]={1.,0.}; > 6 double a2[2]={1.,0.}; > 7 double a3[2]; > 8 unsigned i; > 9 for(i=0;i<1000000000;++i) > 10 { > 11 a3[0]+=a[0]+a1[0]*a2[0]; > 12 a3[1]+=a[1]+a1[1]*a2[1]; > 13 } > 14 printf("%f %f\n",a3[0],a3[1]); > 15 return 0; > 16 } > > The last one I compiled with gcc two previous with dmd and ldc. C code with > -O2 > was the fastest and as fast as d without slicing compiled with ldc. d code > with slicing was 3 times slower (ldc compiler). I tried to compile with > different optimization flags, that didn't help. Maybe I used the wrong ones. > Can someone comment on this?
The flags you want are -O2, -inline -release. If you don't have those, then that might explain some of the slow down on slicing, since -release drops a ton of runtime checks. Otherwise, I'm not sure why its so much slower, the druntime array ops are written using SIMD instructions where available, so it should be fast. -- James Miller