I'm trying std.parallelism, and I made this code (based over foreach parallel example) : import std.stdio; import std.parallelism; import std.math; import std.c.time;
void main () { auto logs = new double[20_000_000]; const num = 10; clock_t clk; double norm; double par; writeln("CPUs : ",totalCPUs ); clk = clock(); foreach (t; 0..num) { foreach(i, ref elem; logs) { elem = log(i + 1.0); } } norm = clock() -clk; clk = clock(); foreach (t; 0..num) { foreach(i, ref elem; taskPool.parallel(logs, 100)) { elem = log(i + 1.0); } } par = clock() -clk; norm = norm / num; par = par / num; writeln("Normal : ", norm / CLOCKS_PER_SEC, " Parallel : ", par / CLOCKS_PER_SEC); } I get this result : CPUs : 2 Normal : 1.325 Parallel : 1.646 And the result changes, every time that I run it, around +-100ms (I think that depends of how are CPUs busy in these moment) I played changin workUnitSize from 1 to 10000000 without any apreciable change.... My computer it's a AMD Athlon 64 X2 Dual Core Processor 6000+ running over a kUbuntu 11.04 64bits with 2 GiB of ram. I compiled it with dmd 2.053 htop shows that when test program are running parallel foreach, both cores are at ~98% of load and with normal foreach, only one core gets at ~99% of load.