Re: [R-pkg-devel] RFC: an interface to manage use of parallelism in packages

Vladimir Dergachev Fri, 03 Nov 2023 22:32:24 -0700



On Wed, 25 Oct 2023, Ivan Krylov wrote:

Summary: at the end of this message is a link to an R package
implementing an interface for managing the use of execution units in R
packages. As a package maintainer, would you agree to use something
like this? Does it look sufficiently reasonable to become a part of R?
Read on for why I made these particular interface choices.

My understanding of the problem stated by Simon Urbanek and Uwe Ligges
[1,2] is that we need a way to set and distribute the CPU core
allowance between multiple packages that could be using very different
methods to achieve parallel execution on the local machine, including
threads and child processes. We could have multiple well-meaning
packages, each of them calling each other using a different parallelism
technology: imagine parallel::makeCluster(getOption('mc.cores'))
combined with parallel::mclapply(mc.cores = getOption('mc.cores')) and
with an OpenMP program that also spawns getOption('mc.cores') threads.
A parallel BLAS or custom multi-threading using std::thread could add
more fuel to the fire.


Hi Ivan,

  Generally, I like the idea. A few comments:

* from a package developer point of view, I would prefer to have a clearidea of how many threads I could use. So having a core R function like"getMaxThreads()" or similar would be useful. What that function returnscould be governed by a package.

In fact, it might be a good idea to allow to have several packagesimplementing "thread governors" for different situations.

* it would make sense to think through whether we want (or not) to allowpackage developers to call omp_set_num_threads() or whether this is doneby R.

This is hairier than you might think. Allowing it forces every packageto call omp_set_num_threads() before OMP block, because there is no way toknow which packaged was called before.

Not allowing to call omp_set_num_threads() might make it difficult touse all the threads, and force R to initialize OpenMP on startup.

* Speaking of initialization of OpenMP, I have seen situations wherespawning some regular pthread threads and then initializing OpenMP forcesall pthread threads to a single CPU.

I think this is because OpenMP sets thread affinity for all the processthreads, but only distributes its own.

* This also raises the question of how affinity is managed. If you havecalled makeForkCluster() to create 10 R instances and then each uses 2OpenMP threads, you do not want those occupying only 2 cpu executionthreads instead of 20.

* From the user perspective, it might be useful to be able to limitnumber of threads per package by using patterns or regular expressions.Often, the reason for limiting number of threads is to reduce memoryusage.

* Speaking of memory usage, glibc has parameters like MALLOC_ARENA_MAXthat have great impact on memory usage of multithreaded programs. Iusually set it to 1, but then I take extra care to make as few memoryallocation calls as possible within individual threads.


best

Vladimir Dergachev

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] RFC: an interface to manage use of parallelism in packages

Reply via email to