Hello,

I am summing up the first 1 billion integers in parallel and in a single thread and I'm observing some curious results;

parallel sum : 499999999500000000, elapsed 102833 ms
single thread sum : 499999999500000000, elapsed 1667 ms

The parallel version is 60+ times slower on my i7-3770K CPU. I think that maybe due to the CPU constantly flushing and reloading the caches in the parallel version but I don't know for sure.

Here is the D code;

        shared ulong sum = 0;
        ulong iter = 1_000_000_000UL;

        StopWatch sw;

        sw.start();

        foreach(i; parallel(iota(0, iter)))
        {
                atomicOp!"+="(sum, i);
        }

        sw.stop();

writefln("parallel sum : %s, elapsed %s ms", sum, sw.peek().msecs);

        sum = 0;

        sw.reset();

        sw.start();

        for (ulong i = 0; i < iter; ++i)
        {
                sum += i;
        }

        sw.stop();

writefln("single thread sum : %s, elapsed %s ms", sum, sw.peek().msecs);

Out of curiosity I tried the equivalent code in C# and I got this;

parallel sum : 499999999500000000, elapsed 20320 ms
single thread sum : 499999999500000000, elapsed 1901 ms

The C# parallel is about 3 times faster than the D parallel which is strange on the exact same CPU.

And here is the C# code;

long sum = 0;
long iter = 1000000000L;

var sw = Stopwatch.StartNew();

Parallel.For(0, iter, i =>
{
        Interlocked.Add(ref sum, i);
});

Console.WriteLine("parallel sum : {0}, elapsed {1} ms", sum, sw.ElapsedMilliseconds);

sum = 0;

sw = Stopwatch.StartNew();

for (long i = 0; i < iter; ++i)
{
        sum += i;
}

Console.WriteLine("single thread sum : {0}, elapsed {1} ms", sum, sw.ElapsedMilliseconds);

Thoughts?

Reply via email to