Hi! I don't have much more input than to say that futures use a built in thread pool that is limited to (current-processor-count) threads. That could maybe be modified using setaffinity ?
Hope this helps. -- Linus Björnstam On Wed, 8 Jan 2020, at 08:56, Zelphir Kaltstahl wrote: > Hello Guile users! > > I thought about what I need for parallelizing an algorithm I am working > on. Background: I am working on my decision tree implementation > (https://notabug.org/ZelphirKaltstahl/guile-ml/src/wip-port-to-guile), > which is currently only using a single core. Training the model splits > the tree into subtrees, which is an opportunity for running things in > parallel. Subtrees can subsequently split again and that could again > result in a parallel evaluation of the subtrees. > > Since it might not always be desired to use potentially all available > cores (blocking the whole CPU), I would like to add the possibility for > the user of the algorithm to limit the number of cores used by the > algorithm, most likely using a keyword argument, which defaults to the > number of cores. Looking at futures, parallel forms and the fibers > library, I've had the following thoughts: > > 1. fibers support a ~#:parallelism~ keyword and thus the number of > fibers running in parallel can be set for ~run-fibers~, which creates a > scheduler, which might be used later, possibly avoiding to keep track of > how many threads are being used. It would probably be important when > using fibers, to always make use of the same scheduler, as a new > scheduler might not know anything about other schedulers and the limit > for parallelism might be overstepped. Schedulers, which are controlled > by the initial scheduler are probably OK. I will need to re-read the > fibers manual on that. > > 2. With parallel forms, there are ~n-par-map~ and ~n-par-for-each~, > where one can specify the maximum number of threads running in parallel. > However, when recursively calling these procedures (in my case > processing subtrees of a decision tree, which might split into subtrees > again), one does not have the knowledge of whether the processing of the > other recursive calls has finished and cannot know, how many other > threads are being used currently. Calling one of these procedures again > might run more threads than specified on a upper level call. > > 3. With futures, one cannot specify the number of futures to run at > maximum directly. In order to control how many threads are evaluating > code at the same time, one would have to build some kind of construct > around them, which keeps track of how many futures are running or could > be running and make that work for recursive creation of further futures > as well. > > 4. One could also do something ugly and create a globally defined active > thread counter, which requires locking, to keep track of the number of > in parallel running threads or futures. > > 5. I could just assume the maximum number of currently used cores, by > giving the tree depth as argument for recursive calls and calculating > from that, how many splits and thus evaluations might be running in > parallel at that point. However, this might be inaccurate, as some > subtree might already be finished and then I would not use the maximum > user specified number of parallel evaluations. > > So my questions are: > > - Is there a default / recommended way to limit parallelism for > recursive calls to parallel forms? > > - Is there a better way than a global counter with locking, to limit the > number of futures created during recursive calls? I would dislike very > much to have to do something like global state + mutex. > > - What do you recommend in general to solve this? > > Regards, > Zelphir > > >