Re: review of std.parallelism

dsimcha Sun, 20 Mar 2011 08:45:59 -0700

On 3/19/2011 2:14 PM, Michel Fortin wrote:

On 2011-03-19 13:20:00 -0400, dsimcha <dsim...@yahoo.com> said:

On 3/19/2011 1:09 PM, Michel Fortin wrote:

For instance:

void main() {
int sum = 0;
foreach (int value; taskPool.parallel([0,2,3,6,1,4,6,3,3,3,6])) {
sum += value;
}
writeln(sum);
}

The "+=" would need to be an atomic operation to avoid low-level races.

I think that ParallelForeach's opApply should only accept a shared
delegate. I define shared delegate as a delegate that does not reference
any non-shared variables of its outer scope. The problem is that DMD
currently doesn't know how to determine whether a delegate literal is
shared or not, thus a delegate literal is never shared and if
ParallelForeach's opApply asked a shared delegate as it should it would
just not work. Fix DMD to create shared delegate literals where
appropriate and everything can be guarantied race-free.


If you want pedal-to-metal parallelism without insane levels of
verbosity, you can't have these safety features.


I'm not sure where my proposal asks for more verbosity or less
performance. All I can see is a few less casts in std.parallelism and
that it'd disallow the case in my example above that is totally wrong.
Either you're interpreting it wrong or there are things I haven't
thought about (and I'd be happy to know about them).

But by looking at all the examples in the documentation, I cannot find
one that would need to be changed... well, except the one I'll discuss
below.

Ok, I've had some time to think about this. The following example issafe, but wouldn't work if I understand correctly what you're proposing:


auto logs = new double[1_000_000];

foreach(i, ref elem; taskPool.parallel(logs)) {
    elem = log(i + 1.0);
}

The problem is that you're writing to the same array from multiplethreads, which the compiler would interpret as writing to the samevariable. However, you can guarantee that no two threads ever write tothe same element, so it's safe.

Note: I'm aware of the false sharing issue when writing to adjacentmemory addresses. However, when writing to an array this big it occursto a negligible extent. If you were using a smaller array, then eitherthe loop body would be so expensive that the cost of false sharing wouldbe negligible, or the whole loop would be too cheap to be worthparallelizing.

I'm also aware that word tearing is a concern on some architectures,though not x86. IMHO std.parallelism and its documentation should notbe pedantic about portability to architectures that a D compiler doesn'teven exist for yet.


Also, your example can be trivially modified to be safe.

void main() {
    int sum = 0;
    foreach (int value; taskPool.parallel([0,2,3,6,1,4,6,3,3,3,6])) {
         synchronized sum += value;
     }
     writeln(sum);
}

In this case that kills all parallelism, but in more realistic cases Iuse this pattern often. I find it very common to have an expensive loopbody can be performed in parallel, except for a tiny portion that mustupdate a shared data structure. I'm aware that it might be possible, intheory, to write this more formally using reduce() or something. However:

1. If the portion of the loop that deals with shared data is very small(and therefore the serialization caused by the synchronized block isnegligible), it's often more efficient to only keep one data structurein memory and update it concurrently, rather than use stronger isolationbetween threads like reduce() does, and have to maintain one datastructure for each thread.

2. In my experience synchronizing on a small portion of the loop bodyworks very well in practice. My general philosophy is that, in alibrary like this, dangerous but useful constructs must be supported andtreated as innocent until proven guilty, not the other way round.

Re: review of std.parallelism

Reply via email to