Re: Finding large difference b/w execution time of c++ and D codes for same problem

jerro Wed, 13 Feb 2013 11:10:23 -0800

When you are comparing LDC and GDC, you should either use-mcpu=generic for ldc or -march=native for GDC, because theirdefault targets are different. GDC will produce code that workson most x86_64 (if you are on a x86_64 system) CPUs by default,and LDC targets the host CPU. But this does not explain thedifference in timings you are seeing here.

One reason why the code generaged by GDC is slower is thatsquarePlusMag isn't inlined. It seems that the fact that itsparameter is const is somehow preventing it from being inlined -I have no idea why. Removing const and adding -march=native togdc flags gives me:


gdc -O3 -finline-functions -frelease tmp.d -o tmp -march=native:
  using floats Total time: 8.283 [sec]
  using doubles Total time: 6.827 [sec]
  using reals Total time: 6.795 [sec]

ldc2 -O3  -release -singleobj tmp.d -oftmp:
  using floats Total time: 3.348 [sec]
  using doubles Total time: 3.08 [sec]
  using reals Total time: 4.174 [sec]

The difference is smaller, but still pretty large.

I have noticed that there are needless conversions in this codethat are slowing down both GDC generated and LDC generated code.This code is a bit faster:


module main;

import std.datetime;
import std.metastrings;
import std.stdio;
import std.typetuple;


enum DIM = 32 * 1024;

int juliaValue;

template Julia(TReal)
{
    struct ComplexStruct
    {
        TReal r;
        TReal i;

        TReal squarePlusMag(ComplexStruct another)
        {
            TReal r1 = r*r - i*i + another.r;
            TReal i1 = cast(TReal)2.0*i*r + another.i;

            r = r1;
            i = i1;

            return (r1*r1 + i1*i1);
        }
    }

    int juliaFunction( int x, int y )
    {
        auto c = ComplexStruct(0.8, 0.156);
        auto a = ComplexStruct(x, y);

        foreach (i; 0 .. 200)
            if (a.squarePlusMag(c) > cast(TReal) 1000)
                return 0;
        return 1;
    }

    void kernel()
    {
        foreach (x; 0 .. DIM) {
            foreach (y; 0 .. DIM) {
                juliaValue = juliaFunction( x, y );
            }
        }
    }
}

void main()
{

writeln("D code serial with dimension " ~ toStringNow!DIM ~ "...");

    StopWatch sw;
    foreach (Math; TypeTuple!(float, double, real))
    {
        sw.start();
        Julia!(Math).kernel();
        sw.stop();
        writefln("  using %ss Total time: %s [sec]",
                 Math.stringof, (sw.peek().msecs * 0.001));
        sw.reset();
    }
}

This gives me:

gdc -O3 -finline-functions -frelease tmp.d -o tmp -march=native:
  using floats Total time: 6.746 [sec]
  using doubles Total time: 6.872 [sec]
  using reals Total time: 5.226 [sec]

ldc2 -O3  -release -singleobj tmp.d -oftmp:
  using floats Total time: 2.36 [sec]
  using doubles Total time: 2.535 [sec]
  using reals Total time: 4.106 [sec]

At least part of the difference is due to the fact thatjuliaFunction still isn't getting inlined (but squarePlusMag is).Making juliaFunction a static method of ComplexStruct causes itto get inlined (again, I have no idea why). Moving juliaFunctioninside ComplexStruct does not affect the performance of LDCgenerated code, but for GDC it gives me:


  using floats Total time: 4.262 [sec]
  using doubles Total time: 4.251 [sec]
  using reals Total time: 3.512 [sec]

There is still a large difference between LDC and GDC four floatsand doubles and I can't explain it. But at least it is muchsmaller than it was initially.


I ran all the benchmarks on 64 bit linux, using core i5 2500k.

Re: Finding large difference b/w execution time of c++ and D codes for same problem

Reply via email to