Re: std.parallelism changes done

Sönke Ludwig Fri, 25 Mar 2011 02:36:17 -0700

Am 25.03.2011 05:14, schrieb dsimcha:

On 3/24/2011 10:21 PM, Sönke Ludwig wrote:


Can you elaborate and/or provide an example of the "general" problem?
I'm not quite sure what you're getting at.


I have one very specific constellation that I can only sketch.. Suppose
you have some kind of complex computation going on in the ThreadPool.
This computation is done by a large set of tasks where each tasks
depends on the result of one or more other tasks. One task is
responsible for coordinating the work - it is spawning tasks and waiting
for their completion to spawn new tasks for which the results are now
available.


As I've said before in related discussions, you are _probably_ better
off using one of the high level primitives instead of using tasks
directly in these cases. If not, I'd prefer to improve the higher level
primitives and/or create new ones if possible. (Feel free to suggest one
if you can think of it.) Tasks are, IMHO, too low level for anything
except basic future/promise parallelism and implementing higher level
primitives. Incidentally the situation you describe (a coordinator task
creating lots of worker tasks) is exactly how amap(), reduce() and
parallel foreach work under the hood. This complexity is completely
encapsulated, though.

I would certainly agree that this belongs to a higher level structure.This structure would basically get a set of special tasks, where each ofthose tasks has a list of all the tasks it depends on. All tasks wouldthen be executed in parallel on a thread pool in on order thatstatisfies their dependencies - possibly with some form of cost functionthat controls which task should come first if there are multiple orderspossible.

One problem here is for example that for the system I have here, I needto execute several tasks in the main thread by sending a message to it(the main thread executes window messages in a loop). Specifically thisis for tasks that use OpenGL or a similar API that has a single threadassigned - and the main thread is the most efficient one to use becauseit already has an OpenGL context.

The important thing is to either support such things or to make itgeneral enough to let the user add it from the outside. Otherwise if youreally need such things, the only option is to completely use a customthread pool and thins means no parallel for, map, reduce and whatevermight be added later.

First thing here is that you do not want to do the waitForce() kind of
waiting in the coordinator task because this might cause the coordinator
to be busy with an expensive taks while it could already spawn new tasks
because maybe in the meantime some other tasks have already finished.


I assume you mean yieldForce().


Yes, sorry, got the names mixed up.


However, if you wait for a condition variable instead (which is fired
after each finished task) and if you can have multiple computations of
this kind running in parallel, you can immediately run into the
situation that the thread pool is crowded with coordinator tasks that
are all waiting for their condition variables which will never be
triggered because no worker tasks can be executed.


I assume you're talking about a condition variable other than the one
yieldForce() uses. As mentioned previously, in the specific case of
yieldForce() this is a solved problem. In the general case I can see the
problem.


Yes, just the general problem with other condition variables.


This is only one example and basically this problem can arise in all
cases where one task depends on another task by some form of waiting
that will not execute the dependency like waitForce() does.


Hmm, ok, I definitely understand the problem now.

But what I wanted to say is, even if it may be difficult to implement
such thread caching now, putting means to execute a Task in its own
thread now into the ThreadPool allows for such an optimization later
(it
could even exist while still keeping Task.executeInNewThread()).


I can't really comment because I still don't understand this very well.


I hope I could make it a little more clear what I mean. The problem is
just that the system I'm talking about is quite complex and it's not
easy to find good and simple examples in that system. The problems of
course arise only in the most complex pathes of execution..

What I'm not sure about is if executeInNewThread() is supposed to be
useful just because it is somtimes nice to have the fine-grained
parallelism of the OS scheduler as opposed to task granilarity, or if
the advantage is supposed to be efficiency gained because the thread
pool is not created. In the latter case the caching of some threads to
be reused for a executeInOwnThread()-method should lead to a better
performance in almost any case where thread creation overhead is
relevant.


Ok, now I'm starting to understand this. Please correct me (once you've
gotten a good night's sleep and can think again) wherever this is wrong:

1. As is currently the case, executeInNewThread() is _guaranteed_ to
start the task immediately. There is never a queue involved.

2. Unlike the current implementation, executeInNewThread() may use a
cached thread. It will **NOT**, however, put the task on a queue or
otherwise delay its execution. If no cached thread is available, it will
create a new one and possibly destroy it when the task is done.


Exactly.


Thanks for this suggestion. Now that I (think I) understand it, it makes
sense in principle. The devil may be in the details, though.

1. How many threads should be cached? I guess this could just be
configurable with some reasonable default.

A configurable minimum number of threads sounds reasonable. The defaultcould probably be a fixed small number like 1 or 2.


2. Should the cache be lazily or eagerly created? I'd assume lazily.


Lazy sounds good.


3. Where would these threads be stored? I think they probably belong in
some kind of thread-safe global data structure, **NOT** in a TaskPool
instance.

Thats a good question.. ThreadPool would be nice because it is the classof which maybe you are already dragging an instance through your code.Global would certainly work.


4. How would we explain to people what the cache is good for and how to
use it? The fact that you're proposing it and even you find this
difficult to convey makes me skeptical that this feature is worth the
weight it adds to the API. Maybe you'll find it easier once you get some
sleep. (I understand the problem it solves at an abstract level but have
never encountered a concrete use case. It also took me a long time to
understand it.)

I would basically just say its a faster way than to create a new threadeach time you start a task. Use it whenever you need to have a task runoutside of the thread pool threads - candidates are tasks that wait alot, either because of IO or because of waiting primitives apart fromthe ones present in ThreadPool (message queues, condition variables).

But please don't make the mistake to dismiss the problem because it iscomplex. Beeing complex and maybe rare does not mean it cannot beimportant. Its like a bug that will delete your data but in very rareand complex use cases of the application. You would not want to ignorethat just because of those reasons.

Also I'm not sure if using the primitives of std.concurrency is allowedwithin in a Task, maybe not. But if it is, it would be really easy toconstruct a higher level example without condition variables and stufflike that.


5. It would break some relaxations of what @safe tasks can do when
started via executeInNewThread().

You mean because of TLS that is not reinitialized each time? I have toadmit that I can't really gauge the impact of this.


6. This whole proposal might fit better in std.concurrency, by using a
thread cache for spawn().

But isn't the previous problem (5.) even more relevant instd.concurrency? Putting it near ThreadPool could be a good idea becauseit still is some sort of thread pool in the abstract sense. It couldalso be something that std.concurrency uses for its spawn().

Anyway, I would be happy if there would be a place allocated for thiswherever this fits. If it is std.concurrency, its fine as long asstd.concurrency and std.parallelism play well together. One problem withstd.concurrency is that it also does not really work well when you needto wait for other primitives than a message. To have something likeWaitForMultipleObjects is critical in many cases, but thats another topic.

Having said all this, I just want to make sure that you don't get thiswrong. I certainly to not want to push in complex changes for no reasonand no complex changes at all for that matter. And in this case I seehow it is _functionally_ independent from the ThreadPool itself.

My problem here is just that there is a executeInNewThread function thatreally should not be used for _many_ tasks and maybe in most other casesit would be cleaner to use spawn() - it may still have its place if youcount in the @safe implications, but I would like to also see a functionsupporting the same threading guarantees but suitable for many tasks, ifonly to avoid bad usage patterns of executeInNewThread.

However it definitely is in the gray area between std.concurrency andstd.parallelism.

Re: std.parallelism changes done

Reply via email to