On Tuesday, 12 May 2015 at 14:59:38 UTC, Gerald Jansen wrote:
I am a data analyst trying to learn enough D to decide whether
to use D for a new project rather than Python + Fortran. I
have recoded a non-trivial Python program to do some simple
parallel data processing (using the map function in Python's
multiprocessing module and parallel foreach in D). I was very
happy that my D version ran considerably faster that Python
version when running a single job but was soon dismayed to find
that the performance of my D version deteriorates rapidly
beyond a handful of jobs whereas the time for the Python
version increases linearly with the number of jobs per cpu core.
The server has 4 quad-core Xeons and abundant memory compared
to my needs for this task even though there are several million
records in each dataset. The basic structure of the D program
is:
import std.parallelism; // and other modules
function main()
{
// ...
// read common data and store in arrays
// ...
foreach (job; parallel(jobs, 1)) {
runJob(job, arr1, arr2.dup);
}
}
function runJob(string job, in int[] arr1, int[] arr2)
{
// read file of job specific data file and modify arr2 copy
// write job specific output data file
}
The output of /usr/bin/time is as follows:
Lang Jobs User System Elapsed %CPU
Py 1 45.17 1.44 0:46.65 99
D 1 8.44 1.17 0:09.24 104
Py 2 79.24 2.16 0:48.90 166
D 2 19.41 10.14 0:17.96 164
Py 30 1255.17 58.38 2:39.54 823 * Pool(12)
D 30 421.61 4565.97 6:33.73 1241
(Note that the Python program was somewhat optimized with numpy
vectorization and a bit of numba jit compilation.)
The system time varies widely between repititions for D with
multiple jobs (eg. from 3.8 to 21.5 seconds for 2 jobs).
Clearly simple my approach with parallel foreach has some
problem(s). Any suggestions?
Gerald Jansen
Have you tried adjusting the workUnitSize argument to parallel?
It should probably be 1 for such large individual tasks.