On 3/19/2011 12:03 PM, Andrei Alexandrescu wrote:
On 03/19/2011 02:32 AM, dsimcha wrote:
Ok, thanks again for clarifying **how** the docs could be improved. I've
implemented the suggestions and generally given the docs a good reading
over and clean up. The new docs are at:

http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html

* Still no synopsis example that illustrates in a catchy way the most
attractive artifacts.

I don't see what I could put here that isn't totally redundant with the rest of the documentation. Anything I could think of would basically just involve concatentating all the examples. Furthermore, none of the other Phobos modules have this, so I don't know what one should look like. In general I feel like std.parallelism is being held to a ridiculously high standard that none of the other Phobos modules currently meet.


* "After creation, Task objects are submitted to a TaskPool for
execution." I understand it's possible to use Task straight as a
promise/future, so s/are/may be/.

No. The only way Task is useful is by submitting it to a pool to be executed. (Though this may change, see below.)

Also it is my understanding that
TaskPool efficiently adapts the concrete number of CPUs to an arbitrary
number of tasks. If that's the case, it's worth mentioning.

Isn't this kind of obvious from the examples, etc.?

* "If a Task has been submitted to a TaskPool instance, is being stored
in a stack frame, and has not yet finished, the destructor for this
struct will automatically call yieldWait() so that the task can finish
and the stack frame can be destroyed safely." At this point in the doc
the reader doesn't understand that at all because TaskPool has not been
seen yet. The reader gets worried that she'll be essentially serializing
the entire process by mistake. Either move this explanation down or
provide an example.

This is getting ridiculous. There are too many circular dependencies between Task and TaskPool that are impossible to remove here that I'm not even going to try. One or the other has to be introduced first, but neither can be explained without mentioning the other. This is why I explain the relationship briefly in the module level summary, so that the user has at least some idea. I think this is about the best I can do.


* Is done() a property?

Yes.  DDoc sucks.


* The example that reads two files at the same time should NOT use
taskPool. It's just one task, why would the pool ever be needed? If you
also provided an example that reads n files in memory at the same time
using a pool, that would illustrate nicely why you need it. If a Task
can't be launched without being put in a pool, there should be a
possibility to do so. At my work we have a simple function called
callInNewThread that does what's needed to launch a function in a new
thread.

I guess I could add something like this to Task. Under the hood it would (for implementation simplicity, to reuse a bunch of code from TaskPool) fire up a new single-thread pool, submit the task, call TaskPool.finish(), and then return. Since you're already creating a new thread, the extra overhead of creating a new TaskPool for the thread would be negligible and it would massively simplify the implementation. My only concern is that, when combined with scoped versus non-scoped tasks (which are definitely here to stay, see below) this small convenience function would add way more API complexity than it's worth. Besides, task pools don't need to be created explicitly anyhow. That's what the default instantiation is for. I don't see how callInNewThread would really solve much.

Secondly, I think you're reading **WAY** too much into what was meant to be a simple example to illustrate usage mechanics. This is another case where I can't think of a small, cute example of where you'd really need the pool. There are plenty of larger examples, but the smallest/most self-contained one I can think of is a parallel sort. I decided to use file reading because it was good enough to illustrate the mechanics of usage, even if it didn't illustrate a particularly good use case.


* The note below that example gets me thinking: it is an artificial
limitation to force users of Task to worry about scope and such. One
should be able to create a Future object (Task I think in your
terminology), pass it around like a normal value, and ultimately force
it. This is the case for all other languages that implement futures. I
suspect the "scope" parameter associated with the delegate a couple of
definitions below plays a role here, but I think we need to work for
providing the smoothest interface possible (possibly prompting
improvements in the language along the way).

This is what TaskPool.task is for. Maybe this should be moved to the top of the definition of TaskPool and emphasized, and the scoped/stack allocated versions should be moved below TaskPool and de-emphasized?

At any rate, though, anyone who's playing with parallelism should be an advanced enough programmer that concepts like scope and destructors are second nature.


* I'm not sure how to interpret the docs for

ReturnType!(F) run(F, Args...)(F fpOrDelegate, ref Args args);

So it's documented but I'm not supposed to care. Why not just remove?
Surely there must be higher-level examples that clarify that I can use
delegates etc.

Yes, and there are such examples. It's just that I want to explicitly state what type is returned in this case rather than returning auto. If I omitted the docs for run(), you'd ask what run() is.


* "If you want to escape the Task object from the function in which it
was created or prefer to heap allocate and automatically submit to the
pool, see TaskPool.task()." I'm uncomfortable that I need to remember a
subtle distinction of the same primitive name ("task") depending on
membership in a TaskPool or not, which is a tenuous relationship to
remember. How about "scopedTask()" vs. "task()"? Also, it's worth asking
ourselves what's the relative overhead of closure allocation vs. task
switching etc. Maybe we get to simplify things a lot at a small cost in
efficiency.

We absolutely **need** scope delegates for calling stuff where a closure can't be allocated due to objects on the stack frame having scoped destruction. This is not going anywhere. Also, I **much** prefer to have everything that does conceptually the same thing have the same name. I think this is clearer, not less clear.


* As I mentioned, in the definition:

Task!(run,TypeTuple!(F,Args)) task(F, Args...)(F fun, Args args);

I can't tell what "run" is.

run() is just the adapter function described right above this code. I cannot for the life of me understand how this could possibly be unclear.

* "A goto from inside the parallel foreach loop to a label outside the
loop will result in undefined behavior." Would this be a bug in dmd?

No, it's because a goto of this form has no reasonable, useful semantics. I should probably mention in the docs that the same applies to labeled break and continue.

I have no idea what semantics these should have, and even if I did, given the long odds that even one person would actually need them, I think they'd be more trouble than they're worth to implement. For example, once you break out of a parallel foreach loop to some arbitrary address (and different threads can goto different labels, etc.), well, it's no longer a parallel foreach loop. It's just a bunch of completely unstructured threading doing god-knows-what.

Therefore, I slapped undefined behavior on it as a big sign that says, "Just don't do it." This also has the advantage that, if anyone ever thinks of any good, clearly useful semantics, these will be implementable without breaking code later.

* Again: speed of e.g. parallel min/max vs. serial, pi computation etc.
on a usual machine?

I **STRONGLY** believe this does not belong in API documentation because it's too machine specific, compiler specific, stack alignment specific, etc. and almost any benchmark worth doing takes up more space than an example should. Furthermore, anyone who wants to know this can easily time it themselves. I have absolutely no intention of including this. While in general I appreciate and have tried to accommodate your suggestions, this is one I'll be standing firm on.


* The join example is fine, but the repetitive code suggests that loops
should be better:

import std.file;

void main() {
auto pool = new TaskPool();
foreach (filename; ["foo.txt", "bar.txt", "baz.txt"]) {
pool.put(task!read(filename));
}

// Call join() to guarantee that all tasks are done running, the worker
// threads have terminated and that the results of all of the tasks can
// be accessed without any synchronization primitives.
pool.join();

ubyte[][] data; // void[], meh
// Use spinWait() since the results are guaranteed to have been computed
// and spinWait() is the cheapest of the wait functions.
foreach (task; pool.howDoIEnumerateTasks()) {
data ~= task1.spinWait();
}

You can't enumerate the tasks like this, but there's a good reason for this limitation. The design requires the pool not hold onto any references to the tasks after they're out of the queue, so that they can be safely destroyed as soon as the caller wants to destroy them. If you need to enumerate over them like this, then:

1. You're probably doing something wrong and parallel foreach would be a better choice.

2. If you insist on doing this, you need to explicitly store them in an array or something.

As a more general comment, I know the join() example sucks. join() is actually a seldom-used primitive, not a frequently used one as you suggest. Most of the time you wait for individual tasks (explicitly or implicitly through the higher level functions), not the whole pool, to finish executing. I feel like it's obligatory that join() and finish() exist, but I never use them myself. They were much more useful in pre-release versions of this lib, when parallel foreach didn't exist and it made sense to have explicit arrays of Tasks. The difficulty I've had in coming up with a decent example for them makes me halfway tempted to just rip those $#)*% functions out entirely.

Reply via email to