On Saturday, 31 May 2014 at 05:12:54 UTC, Marco Leise wrote:
Run this with: -O3 -frelease -fno-assert -fno-bounds-check -march=native
This way GCC and LLVM will recognize that you alternately add
p0 and p1 to the sum and partially unroll the loop, thereby
removing the condition. It takes 1.4xxxx nanoseconds per step
on my not so new 2.0 Ghz notebook, so I assume your PC will
easily reach parity with your original C++ version.



import std.stdio;
import core.time;

alias ℕ = size_t;

void main()
{
        run!plus(1_000_000_000);
}

double plus(ℕ steps)
{
        enum p0 = 0.0045;
        enum p1 = 1.00045452 - p0;

        double sum = 1.346346;
        foreach (i; 0 .. steps)
                sum += i%2 ? p1 : p0;
        return sum;
}

void run(alias func)(ℕ steps)
{
        auto t1 = TickDuration.currSystemTick;
        auto output = func(steps);
        auto t2 = TickDuration.currSystemTick;
auto nanotime = 1_000_000_000.0 / steps * (t2 - t1).length / TickDuration.ticksPerSec;
        writefln("Last: %s", output);
        writefln("Time per op: %s", nanotime);
        writeln();
}


Thank you for the help. Which OS is running on your notebook ? For I compiled your source code with your settings with the GCC compiler. The run took 3.1xxxx nanoseconds per step. For the DMD compiler the run took 5.xxxx nanoseconds. So I think the problem could be specific to the linux versions of the GCC and the DMD compilers.


Thomas

Reply via email to