On Wednesday, 9 July 2014 at 13:18:00 UTC, Larry wrote:
On Wednesday, 9 July 2014 at 12:25:40 UTC, bearophile wrote:
Larry:

Now the performance :
D : 12 µs
C : < 1µs

Where does the diff comes from ? Is there a way to optimize the d version ?

Again, I am absolutely new to D and those are my very first line of code with it.

Your C code is not equivalent to the D code, there are small differences, even the output is different. So I've cleaned up your C and D code:

------------------------

// C code.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <time.h>
#include <sys/time.h>
#include "jol.h"

int main() {
   struct timeval s, e;
   gettimeofday(&s, NULL);

   int pol = 5;
   tes(&pol);

int arr[] = {9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215};
   int len = 13 - 1;
   int g = 0;

   for (int x = 36; x >= 0; --x) {
       for (int y = len; y >= 0; --y) {
           ++g;
           arr[y]++;
       }
   }

   gettimeofday(&e, NULL);
   printf("C: %d %lu %d %d %d\n",
          g, e.tv_usec - s.tv_usec, arr[4], arr[9], pol);

   return 0;
}

------------------------

D code ("final" functions have not much meaning, but the D compiler is very sloppy and doesn't complain):


module jol;

void tes(ref int a) {
   a = 9;
}


---------

module maind;

void main() {
   import std.stdio;
   import std.datetime;
   import jol;

   StopWatch sw;
   sw.start;

   int pol = 5;
   tes(pol);

int[] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215];
   int len = 13 - 1;
   int g = 0;

   for (int x = 36; x >= 0; --x) {
       // Some code here erased for the test.
       for (int y = len; y >= 0; --y) {
           // Some other code here.
           ++g;
           arr[y]++;
       }
   }

   sw.stop;
   writefln("D: %d %d %d %d %d",
            g, sw.peek.nsecs, arr[4], arr[9], pol);
}

----------------

That D code is not fully idiomatic, this is closer to idiomatic D code:


module jol2;

void test(ref int x) pure nothrow @safe {
   x = 9;
}



module maind;

void main() {
   import std.stdio, std.datetime;
   import jol2;

   StopWatch sw;
   sw.start;

   int pol = 5;
   test(pol);

int[13] arr = [9, 16, 458, 2, 68, 5452, 98, 32, 4, 565, 78, 985, 3215];
   uint count = 0;

   foreach_reverse (immutable _; 0 .. 37) {
       foreach_reverse (ref ai; arr) {
           count++;
           ai++;
       }
   }

   sw.stop;
   writefln("D: %d %d %d %d %d",
            count, sw.peek.nsecs, arr[4], arr[9], pol);
}

----------------

In my benchmarks I don't have used the more idiomatic D code, I have used the C-like code. But the run-time is essentially the same.

I compile the C and D code with (on a 32 bit Windows):

gcc -march=native -std=c11 -O2 main.c jol.c -o main

ldmd2 -wi -O -release -inline -noboundscheck maind.d jol.d
strip maind.exe

For the D code I've used the latest ldc2 compiler (V. 0.13.0, based on DMD v2.064 and LLVM 3.4.2), GCC is V.4.8.0 (rubenvb-4.8.0).

----------------

The C code gives as ouput:

C: 481 0 105 602 9


The D code gives as output:

D: 481 6076 105 602 9

----------------------

If I slow down the CPU at half speed the C code runs in about 0.05 seconds, the D code runs in about 0.07 seconds.

Such run times are too much small to perform a sufficiently meaningful comparison. You need a run-time of about 2 seconds to get meaningful timings.

The difference between 0.05 and 0.07 is caused by initializing the D rutime (like the D GC), it takes about 0.015 seconds on my systems at full speed CPU to initialize the D runtime, and it's a constant time.

Bye,
bearophile

You are definitely right, I did mess up while translating !

I run the corrected codes (the ones I was meant to provide :S) and on a slow macbook I end up with :
C : 2
D : 15994

Of course when run on very high end machines, this diff is almost non existent but we want to run on very low powered hardware.

Ok, even with a longer code, there will always be a launch penalty for d. So I cannot use it for very high performance loops.

Shame for us..
:)

Thanks and bye

Could you provide the exact code you are using for that benchmark? Once the program has started up you should be able to obtain performance parity between C and D. Situations where this isn't true are problems we would like to know about.

For the amount of work you are doing in the test program (almost nothing), the total runtime is probably dominated by the program load time etc. even when using C.

Reply via email to