Re: review of std.parallelism

dsimcha Sat, 19 Mar 2011 00:36:22 -0700

Ok, thanks again for clarifying **how** the docs could be improved.I've implemented the suggestions and generally given the docs a goodreading over and clean up. The new docs are at:


http://cis.jhu.edu/~dsimcha/d/phobos/std_parallelism.html


On 3/18/2011 11:29 PM, Andrei Alexandrescu wrote:

0. Overview and vote

I think the library delivers the high-level goods (parallel foreach,
map, reduce) but is a bit fuzzy in the lower level details.

Documentation hurts understanding of the capabilities of the library and
essentially is of inadequate quality. Entity documentation and examples
do little more than give the impression of dispensing with an unpleasant
chore.

My vote in favor of acceptance is contingent upon a radical improvement
in the quality of documentation and examples. Most if not all artifacts
should be motivated by simple, compelling examples. The introductory
section must contain a brief and attractive synopsis of the flagship
capabilities. All terms must be defined before being used and introduced
in a carefully-chosen order. The relationship between various entities
should be clarified.

I've seen the argument that simple and strong examples are difficult to
find. Though I agree such examples are not easy to come up with, I also
believe the author of library is in the best position to produce them.

================

1. Library proper:

* "In the case of non-random access ranges, parallel foreach is still
usable but buffers lazily to an array..." Wouldn't strided processing
help? If e.g. 4 threads the first works on 0, 4, 8, ... second works on
1, 5, 9, ... and so on.

* Example with squares would be better if double replaced uint, and if a
more complicated operation (sqrt, log...) were involved.

* I'm unclear on the tactics used by lazyMap. I'm thinking the obvious
method should be better: just use one circular buffer. The presence of
two dependent parameters makes this abstraction difficult to operate with.

* Same question about asyncBuf. What is wrong with a circular buffer
filled on one side by threads and on the consumed from the other by the
client? I can think of a couple of answers but it would be great if they
were part of the documentation.

* Why not make workerIndex a ulong and be done with it?

* Is stop() really trusted or just unsafe? If it's forcibly killing
threads then its unsafe.

* uint size() should be size_t for conformity.

* defaultPoolThreads - should it be a @property?

* I don't understand what Task is supposed to do. It is circularly
defined: "A struct that encapsulates the information about a task,
including its current status, what pool it was submitted to, and its
arguments." OK, but what's a task? Could a task be used outside a task
pool, and if so to what effect(s)?

* If LazyMap is only necessary as the result of lazyMap, could that
become a local type inside lazyMap?


2. Documentation:

* Documentation unacceptable. It lacks precision and uses terms before
defining them. For example: "This class encapsulates a task queue and a
set of worker threads" comes before the notions of "task queue" and
"worker thread" are defined. Clearly there is an intuition of what those
are supposed to mean, but in practice each library lays down some fairly
detailed definition of such.

* "Note: Initializing a pool with zero threads (as would happen in the
case of a single-core CPU) is well-tested and does work." The absence of
a bug should not be advertised in so many words. Simpler: "Note:
Single-CPU machines will operate transparently with zero-sized pools."

* "Allows for custom pool size." I have no idea what pool size means.

* "// Do something interesting." Worst example ever. You are the creator
of the library. If _you_ can't find a brief compelling example, who could?

* "Immediately after the range argument, an optional block size argument
may be provided. If none is provided, the default block size is used. An
optional buffer for returining the results may be provided as the last
argument. If one is not provided, one will be automatically allocated.
If one is provided, it must be the same length as the range." An example
should be inserted after each of these sentences.

* "// Do something expensive with line." Better you do something
expensive with line.

* "This is not the same thing as commutativity." I think this is obvious
enough to be left out. The example of matrices (associative but not
commutative) is nice though.

* "immutable n = 1000000000;" -> use underscores

* Would be great if the documentation examples included some rough
experimental results, e.g. "3.8 times faster on a 4-core machine than
the equivalent serial loop".

* No example for workerIndex and why it's useful.

* Is LocalWorker better than WorkerLocal? No, because it's not the
worker that's local, it's the storage - which is missing from the name!
WorkerLocalStorage? Also replace "create" with "make" or drop it
entirely. The example doesn't tell me how I can use bufs. I suspect
workerIndex has something to do with it but the example fails to divulge
that relationship.

* No example for join(), which is quite a central primitive.

* No example for put(), which would be interesting to see.

* No example for task().

* No example for run().

* The example for Task uses myFuture without defining it. In case Task
defines the promise/future idiom, the documentation misses a terrific
opportunity of clarifying that.

* What is 'run' in the definition of safe task()?

* Why is AsyncBuf's doc far away from the function returning it? Same
about WorkerLocal.


Andrei

Re: review of std.parallelism

Reply via email to