I think your use case is both beyond the scope of std.parallelism and better handled by std.concurrency. std.parallelism is mostly meant to handle the pure multicore parallelism use case. It's not that it **can't** handle other use cases, but that's not what it's tuned for.

As far as prioritization, it wouldn't be hard to implement prioritization of when a task starts (i.e. have a high- and low-priority queue). However, the whole point of TaskPool is to avoid starting a new thread for each task. Threads are recycled for efficiency. This prevents changing the priority of things in the OS scheduler. I also don't see how to generalize prioritization to map, reduce, parallel foreach, etc. w/o making the API much more complex.

In addition, std.parallelism guarantees that tasks will be started in the order that they're submitted, except that if the results are needed immediately and the task hasn't been started yet, it will be pulled out of the middle of the queue and executed immediately. One way to get the prioritization you need is to just submit the tasks in order of priority, assuming you're submitting them all from the same place.

One last thing: As far as I/O goes, AsyncBuf may be useful. This allows you to pipeline reading of a file and higher level processing. Example:

// Read the lines of a file into memory in parallel with processing
// them.
import std.stdio, std.parallelism, std.algorithm;

void main() {
    auto lines = map!"a.idup"(File("foo.txt").byLine());
    auto pipelined = taskPool.asyncBuf(lines);

    foreach(line; pipelined) {
        auto ls = line.split("\t");
        auto nums = to!(double[])(ls);
    }
}

On 3/18/2011 9:27 PM, Michel Fortin wrote:
On 2011-03-18 17:12:07 -0400, Andrei Alexandrescu
<seewebsiteforem...@erdani.org> said:

On 3/18/11 3:55 PM, dsimcha wrote:
It's kinda interesting--I don't know at all where this lib stands.
The deafening
silence for the past week makes me think one of two things is true:

1. std.parallelism solves a problem that's too niche for 90% of D
users, or

2. It's already been through so many rounds of discussion in various
places
(informally with friends, then on the Phobos list, then on this NG)
that there
really is nothing left to nitpick.

I have no idea which of these is true.

Probably a weighted average of the two. If I were to venture a guess
I'd ascribe more weight to 1. This is partly because I'm also
receiving relatively little feedback on the concurrency chapter in
TDPL. Also the general pattern on many such discussion groups is that
the amount of traffic on a given topic is inversely correlated with
its complexity.

One reason might also be that not many people are invested in D for such
things right now. It's hard to review such code and make useful comments
without actually testing it on a problem that would benefit from its use.

If I was writing in D the application I am currently writing, I'd
certainly give it a try. But the thing I have that would benefit from
something like this is in Objective-C (it's a Cocoa program I'm
writing). I'll eventually get D to interact well with Apple's
Objective-C APIs, but in the meantime all I'm writing in D is some
simple web stuff which doesn't require multithreading at all.

In my application, what I'm doing is starting hundreds of tasks from the
main thread, and once those tasks are done they generally send back a
message to the main thread through Cocoa's event dispatching mechanism.
 From a quick glance at the documentation, std.parallelism offers what
I'd need if I were to implement a similar application in D. The only
thing I don't see is a way to priorize tasks: some of my tasks need a
more immediate execution than others in order to keep the application
responsive.

One interesting bit: what I'm doing in those tasks is mostly I/O on the
hard drive combined with some parsing. I find a task queue is useful to
manage all the work, in my case it's not really about maximizing the
utilization of a multicore processor but more about keeping it out of
the main thread so the application is still responsive. Maximizing speed
is still a secondary objective, but given most of the work is I/O-bound,
having multiple cores available doesn't help much.


Reply via email to