On 3/19/2011 2:14 PM, Michel Fortin wrote:
On 2011-03-19 13:20:00 -0400, dsimcha <dsim...@yahoo.com> said:

On 3/19/2011 1:09 PM, Michel Fortin wrote:
For instance:

void main() {
int sum = 0;
foreach (int value; taskPool.parallel([0,2,3,6,1,4,6,3,3,3,6])) {
sum += value;
}
writeln(sum);
}

The "+=" would need to be an atomic operation to avoid low-level races.

I think that ParallelForeach's opApply should only accept a shared
delegate. I define shared delegate as a delegate that does not reference
any non-shared variables of its outer scope. The problem is that DMD
currently doesn't know how to determine whether a delegate literal is
shared or not, thus a delegate literal is never shared and if
ParallelForeach's opApply asked a shared delegate as it should it would
just not work. Fix DMD to create shared delegate literals where
appropriate and everything can be guarantied race-free.

If you want pedal-to-metal parallelism without insane levels of
verbosity, you can't have these safety features.

I'm not sure where my proposal asks for more verbosity or less
performance. All I can see is a few less casts in std.parallelism and
that it'd disallow the case in my example above that is totally wrong.
Either you're interpreting it wrong or there are things I haven't
thought about (and I'd be happy to know about them).

But by looking at all the examples in the documentation, I cannot find
one that would need to be changed... well, except the one I'll discuss
below.


Ok, I've had some time to think about this. The following example is safe, but wouldn't work if I understand correctly what you're proposing:

auto logs = new double[1_000_000];

foreach(i, ref elem; taskPool.parallel(logs)) {
    elem = log(i + 1.0);
}

The problem is that you're writing to the same array from multiple threads, which the compiler would interpret as writing to the same variable. However, you can guarantee that no two threads ever write to the same element, so it's safe.

Note: I'm aware of the false sharing issue when writing to adjacent memory addresses. However, when writing to an array this big it occurs to a negligible extent. If you were using a smaller array, then either the loop body would be so expensive that the cost of false sharing would be negligible, or the whole loop would be too cheap to be worth parallelizing.

I'm also aware that word tearing is a concern on some architectures, though not x86. IMHO std.parallelism and its documentation should not be pedantic about portability to architectures that a D compiler doesn't even exist for yet.

Also, your example can be trivially modified to be safe.

void main() {
    int sum = 0;
    foreach (int value; taskPool.parallel([0,2,3,6,1,4,6,3,3,3,6])) {
         synchronized sum += value;
     }
     writeln(sum);
}

In this case that kills all parallelism, but in more realistic cases I use this pattern often. I find it very common to have an expensive loop body can be performed in parallel, except for a tiny portion that must update a shared data structure. I'm aware that it might be possible, in theory, to write this more formally using reduce() or something. However:

1. If the portion of the loop that deals with shared data is very small (and therefore the serialization caused by the synchronized block is negligible), it's often more efficient to only keep one data structure in memory and update it concurrently, rather than use stronger isolation between threads like reduce() does, and have to maintain one data structure for each thread.

2. In my experience synchronizing on a small portion of the loop body works very well in practice. My general philosophy is that, in a library like this, dangerous but useful constructs must be supported and treated as innocent until proven guilty, not the other way round.

Reply via email to