Re: review of std.parallelism

dsimcha Sat, 19 Mar 2011 10:21:04 -0700

On 3/19/2011 12:03 PM, Andrei Alexandrescu wrote:

On 03/19/2011 02:32 AM, dsimcha wrote:

Ok, thanks again for clarifying **how** the docs could be improved. I've
implemented the suggestions and generally given the docs a good reading
over and clean up. The new docs are at:


http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html


* Still no synopsis example that illustrates in a catchy way the most
attractive artifacts.

I don't see what I could put here that isn't totally redundant with therest of the documentation. Anything I could think of would basicallyjust involve concatentating all the examples. Furthermore, none of theother Phobos modules have this, so I don't know what one should looklike. In general I feel like std.parallelism is being held to aridiculously high standard that none of the other Phobos modulescurrently meet.


* "After creation, Task objects are submitted to a TaskPool for
execution." I understand it's possible to use Task straight as a
promise/future, so s/are/may be/.

No. The only way Task is useful is by submitting it to a pool to beexecuted. (Though this may change, see below.)

Also it is my understanding that
TaskPool efficiently adapts the concrete number of CPUs to an arbitrary
number of tasks. If that's the case, it's worth mentioning.


Isn't this kind of obvious from the examples, etc.?

* "If a Task has been submitted to a TaskPool instance, is being stored
in a stack frame, and has not yet finished, the destructor for this
struct will automatically call yieldWait() so that the task can finish
and the stack frame can be destroyed safely." At this point in the doc
the reader doesn't understand that at all because TaskPool has not been
seen yet. The reader gets worried that she'll be essentially serializing
the entire process by mistake. Either move this explanation down or
provide an example.

This is getting ridiculous. There are too many circular dependenciesbetween Task and TaskPool that are impossible to remove here that I'mnot even going to try. One or the other has to be introduced first, butneither can be explained without mentioning the other. This is why Iexplain the relationship briefly in the module level summary, so thatthe user has at least some idea. I think this is about the best I can do.


* Is done() a property?


Yes.  DDoc sucks.


* The example that reads two files at the same time should NOT use
taskPool. It's just one task, why would the pool ever be needed? If you
also provided an example that reads n files in memory at the same time
using a pool, that would illustrate nicely why you need it. If a Task
can't be launched without being put in a pool, there should be a
possibility to do so. At my work we have a simple function called
callInNewThread that does what's needed to launch a function in a new
thread.

I guess I could add something like this to Task. Under the hood itwould (for implementation simplicity, to reuse a bunch of code fromTaskPool) fire up a new single-thread pool, submit the task, callTaskPool.finish(), and then return. Since you're already creating a newthread, the extra overhead of creating a new TaskPool for the threadwould be negligible and it would massively simplify the implementation.My only concern is that, when combined with scoped versus non-scopedtasks (which are definitely here to stay, see below) this smallconvenience function would add way more API complexity than it's worth.Besides, task pools don't need to be created explicitly anyhow.That's what the default instantiation is for. I don't see howcallInNewThread would really solve much.

Secondly, I think you're reading **WAY** too much into what was meant tobe a simple example to illustrate usage mechanics. This is another casewhere I can't think of a small, cute example of where you'd really needthe pool. There are plenty of larger examples, but the smallest/mostself-contained one I can think of is a parallel sort. I decided to usefile reading because it was good enough to illustrate the mechanics ofusage, even if it didn't illustrate a particularly good use case.


* The note below that example gets me thinking: it is an artificial
limitation to force users of Task to worry about scope and such. One
should be able to create a Future object (Task I think in your
terminology), pass it around like a normal value, and ultimately force
it. This is the case for all other languages that implement futures. I
suspect the "scope" parameter associated with the delegate a couple of
definitions below plays a role here, but I think we need to work for
providing the smoothest interface possible (possibly prompting
improvements in the language along the way).

This is what TaskPool.task is for. Maybe this should be moved to thetop of the definition of TaskPool and emphasized, and the scoped/stackallocated versions should be moved below TaskPool and de-emphasized?

At any rate, though, anyone who's playing with parallelism should be anadvanced enough programmer that concepts like scope and destructors aresecond nature.


* I'm not sure how to interpret the docs for

ReturnType!(F) run(F, Args...)(F fpOrDelegate, ref Args args);

So it's documented but I'm not supposed to care. Why not just remove?
Surely there must be higher-level examples that clarify that I can use
delegates etc.

Yes, and there are such examples. It's just that I want to explicitlystate what type is returned in this case rather than returning auto. IfI omitted the docs for run(), you'd ask what run() is.


* "If you want to escape the Task object from the function in which it
was created or prefer to heap allocate and automatically submit to the
pool, see TaskPool.task()." I'm uncomfortable that I need to remember a
subtle distinction of the same primitive name ("task") depending on
membership in a TaskPool or not, which is a tenuous relationship to
remember. How about "scopedTask()" vs. "task()"? Also, it's worth asking
ourselves what's the relative overhead of closure allocation vs. task
switching etc. Maybe we get to simplify things a lot at a small cost in
efficiency.

We absolutely **need** scope delegates for calling stuff where a closurecan't be allocated due to objects on the stack frame having scopeddestruction. This is not going anywhere. Also, I **much** prefer tohave everything that does conceptually the same thing have the samename. I think this is clearer, not less clear.


* As I mentioned, in the definition:

Task!(run,TypeTuple!(F,Args)) task(F, Args...)(F fun, Args args);

I can't tell what "run" is.

run() is just the adapter function described right above this code. Icannot for the life of me understand how this could possibly be unclear.

* "A goto from inside the parallel foreach loop to a label outside the
loop will result in undefined behavior." Would this be a bug in dmd?

No, it's because a goto of this form has no reasonable, usefulsemantics. I should probably mention in the docs that the same appliesto labeled break and continue.

I have no idea what semantics these should have, and even if I did,given the long odds that even one person would actually need them, Ithink they'd be more trouble than they're worth to implement. Forexample, once you break out of a parallel foreach loop to some arbitraryaddress (and different threads can goto different labels, etc.), well,it's no longer a parallel foreach loop. It's just a bunch of completelyunstructured threading doing god-knows-what.

Therefore, I slapped undefined behavior on it as a big sign that says,"Just don't do it." This also has the advantage that, if anyone everthinks of any good, clearly useful semantics, these will beimplementable without breaking code later.


* Again: speed of e.g. parallel min/max vs. serial, pi computation etc.
on a usual machine?

I **STRONGLY** believe this does not belong in API documentation becauseit's too machine specific, compiler specific, stack alignment specific,etc. and almost any benchmark worth doing takes up more space than anexample should. Furthermore, anyone who wants to know this can easilytime it themselves. I have absolutely no intention of including this.While in general I appreciate and have tried to accommodate yoursuggestions, this is one I'll be standing firm on.


* The join example is fine, but the repetitive code suggests that loops
should be better:

import std.file;

void main() {
auto pool = new TaskPool();
foreach (filename; ["foo.txt", "bar.txt", "baz.txt"]) {
pool.put(task!read(filename));
}

// Call join() to guarantee that all tasks are done running, the worker
// threads have terminated and that the results of all of the tasks can
// be accessed without any synchronization primitives.
pool.join();

ubyte[][] data; // void[], meh
// Use spinWait() since the results are guaranteed to have been computed
// and spinWait() is the cheapest of the wait functions.
foreach (task; pool.howDoIEnumerateTasks()) {
data ~= task1.spinWait();
}

You can't enumerate the tasks like this, but there's a good reason forthis limitation. The design requires the pool not hold onto anyreferences to the tasks after they're out of the queue, so that they canbe safely destroyed as soon as the caller wants to destroy them. If youneed to enumerate over them like this, then:

1. You're probably doing something wrong and parallel foreach would bea better choice.

2. If you insist on doing this, you need to explicitly store them in anarray or something.

As a more general comment, I know the join() example sucks. join() isactually a seldom-used primitive, not a frequently used one as yousuggest. Most of the time you wait for individual tasks (explicitly orimplicitly through the higher level functions), not the whole pool, tofinish executing. I feel like it's obligatory that join() and finish()exist, but I never use them myself. They were much more useful inpre-release versions of this lib, when parallel foreach didn't exist andit made sense to have explicit arrays of Tasks. The difficulty I've hadin coming up with a decent example for them makes me halfway tempted tojust rip those $#)*% functions out entirely.

Re: review of std.parallelism

Reply via email to