Re: Limiting parallelism using futures, parallel forms and fibers

Linus Björnstam Wed, 08 Jan 2020 00:13:10 -0800

Hi! 

I don't have much more input than to say that futures use a built in thread 
pool that is limited to (current-processor-count) threads. That could maybe be 
modified using setaffinity ?


Hope this helps.

-- 
  Linus Björnstam

On Wed, 8 Jan 2020, at 08:56, Zelphir Kaltstahl wrote:
> Hello Guile users!
> 
> I thought about what I need for parallelizing an algorithm I am working
> on. Background: I am working on my decision tree implementation
> (https://notabug.org/ZelphirKaltstahl/guile-ml/src/wip-port-to-guile),
> which is currently only using a single core. Training the model splits
> the tree into subtrees, which is an opportunity for running things in
> parallel. Subtrees can subsequently split again and that could again
> result in a parallel evaluation of the subtrees.
> 
> Since it might not always be desired to use potentially all available
> cores (blocking the whole CPU), I would like to add the possibility for
> the user of the algorithm to limit the number of cores used by the
> algorithm, most likely using a keyword argument, which defaults to the
> number of cores. Looking at futures, parallel forms and the fibers
> library, I've had the following thoughts:
> 
> 1. fibers support a ~#:parallelism~ keyword and thus the number of
> fibers running in parallel can be set for ~run-fibers~, which creates a
> scheduler, which might be used later, possibly avoiding to keep track of
> how many threads are being used. It would probably be important when
> using fibers, to always make use of the same scheduler, as a new
> scheduler might not know anything about other schedulers and the limit
> for parallelism might be overstepped. Schedulers, which are controlled
> by the initial scheduler are probably OK. I will need to re-read the
> fibers manual on that.
> 
> 2. With parallel forms, there are ~n-par-map~ and ~n-par-for-each~,
> where one can specify the maximum number of threads running in parallel.
> However, when recursively calling these procedures (in my case
> processing subtrees of a decision tree, which might split into subtrees
> again), one does not have the knowledge of whether the processing of the
> other recursive calls has finished and cannot know, how many other
> threads are being used currently. Calling one of these procedures again
> might run more threads than specified on a upper level call.
> 
> 3. With futures, one cannot specify the number of futures to run at
> maximum directly. In order to control how many threads are evaluating
> code at the same time, one would have to build some kind of construct
> around them, which keeps track of how many futures are running or could
> be running and make that work for recursive creation of further futures
> as well.
> 
> 4. One could also do something ugly and create a globally defined active
> thread counter, which requires locking, to keep track of the number of
> in parallel running threads or futures.
> 
> 5. I could just assume the maximum number of currently used cores, by
> giving the tree depth as argument for recursive calls and calculating
> from that, how many splits and thus evaluations might be running in
> parallel at that point. However, this might be inaccurate, as some
> subtree might already be finished and then I would not use the maximum
> user specified number of parallel evaluations.
> 
> So my questions are:
> 
> - Is there a default / recommended way to limit parallelism for
> recursive calls to parallel forms?
> 
> - Is there a better way than a global counter with locking, to limit the
> number of futures created during recursive calls? I would dislike very
> much to have to do something like global state + mutex.
> 
> - What do you recommend in general to solve this?
> 
> Regards,
> Zelphir
> 
> 
>

Re: Limiting parallelism using futures, parallel forms and fibers

Reply via email to