Hello,
I tried converting a c++/go ray tracing benchmark [1] to D [2].

I tried to using std.parallelism amap to implement parallelism, but it does not seem to scale in the manner I expect.

By running the program with different numbers of threads in the thread pool, I got these results (core i7 sandy bridge, 4 core +HT):

Threads                 1       2       3       4       5       6       7       
8
Real time (s) 34.14 26.894 21.293 20.184 19.998 25.977 34.15 39.404 User time (s) 62.84 65.182 65.895 70.851 78.521 111.012 157.448 173.074
System time (s)         0.27    0.562   1.276   1.596   2.178   4.008   6.588   
8.652
System calls 155808 224084 291634 403496 404161 360065 360065 258661 System calls error 39643 80245 99000 147487 155605 142922 142922 108454

I got these measurements using latest DMD/druntime/phobos, compiled with "-O -release -inline -noboundscheck"

I used time and strace -c to measure:
e.g.:
time ./main -h=256 -w=256 -t=7 > /dev/null
strace -c ./main -h=256 -w=256 -t=7 > /dev/null

What I also noticed in the task manager is that no matter what I did, I could not get the utilization to go anywhere close to 99% (unlike the C++ program in [1].)

My interpretation of these results is that std.parallelism.amap does significant communication between threads which causes issues with scaling.

[1] https://github.com/kid0m4n/rays
[2] https://github.com/Safety0ff/rays/tree/master/drays

Reply via email to