Re: [R-pkg-devel] RFC: an interface to manage use of parallelism in packages

Reed A. Cartwright Wed, 25 Oct 2023 13:55:55 -0700

Hi Ivan,

Interesting package, and I'll provide more feedback later. For a
comparison, I'd recommend looking at how GNU make does parallel processing.
It uses the concept of job server and job slots. What I like about it is
that it is implemented at the OS level because make needs to support
interacting with non-make processes. On Windows it uses a named semaphore,
and on Unix-like system it uses named pipes or simple pipes to pass tokens
around.


https://www.gnu.org/software/make/manual/make.html#Job-Slots

Cheers,
Reed


On Wed, Oct 25, 2023 at 5:55 AM Ivan Krylov <krylov.r...@gmail.com> wrote:

> Summary: at the end of this message is a link to an R package
> implementing an interface for managing the use of execution units in R
> packages. As a package maintainer, would you agree to use something
> like this? Does it look sufficiently reasonable to become a part of R?
> Read on for why I made these particular interface choices.
>
> My understanding of the problem stated by Simon Urbanek and Uwe Ligges
> [1,2] is that we need a way to set and distribute the CPU core
> allowance between multiple packages that could be using very different
> methods to achieve parallel execution on the local machine, including
> threads and child processes. We could have multiple well-meaning
> packages, each of them calling each other using a different parallelism
> technology: imagine parallel::makeCluster(getOption('mc.cores'))
> combined with parallel::mclapply(mc.cores = getOption('mc.cores')) and
> with an OpenMP program that also spawns getOption('mc.cores') threads.
> A parallel BLAS or custom multi-threading using std::thread could add
> more fuel to the fire.
>
> Workarounds applied by the package maintainers nowadays are both
> cumbersome (sometimes one has to talk to some package that lives
> downstream in the call stack and isn't even an explicit dependency,
> because it's the one responsible for the threads) and not really enough
> (most maintainers forget to restore the state after they are done, so a
> single example() may slow down the operations that follow).
>
> The problem is complicated by the fact that not every parallel
> operation can explicitly accept the CPU core limit as a parameter. For
> example, data.table's implicit parallelism is very convenient, and so
> are parallel BLASes (which don't have a standard interface to change
> the number of threads), so we shouldn't be prohibiting implicit
> parallelism.
>
> It's also not always obvious how to split the cores between the
> potentially parallel sections. While it's typically best to start with
> the outer loop (e.g. better have 16 R processes solving relatively
> small linear algebra problems back to back than have one R process
> spinning 15 of its 16 OpenBLAS threads in sched_yield()), it may be
> more efficient to give all 16 threads back to BLAS (and save on
> transferring the problems and solutions between processes) once the
> problems become large enough to give enough work to all of the cores.
>
> So as a user, I would like an interface that would both let me give all
> of the cores to the program if that's what I need (something like
> setCPUallowance(parallelly::availableCores())) _and_ let me be more
> detailed when necessary (something like setCPUallowance(overall = 7,
> packages = c(foobar = 1), BLAS = 2) to limit BLAS threads to 2,
> disallow parallelism in the foobar package because it wastes too much
> time, and limit R as a whole to 7 cores because I want to surf the 'net
> on the remaining one while the Monte-Carlo simulation is going on). As
> a package developer, I'd rather not think about any of that and just
> use a function call like getCPUallowance() for the default number of
> cores in every situation.
>
> Can we implement such an interface? The main obstacle here is not being
> able to know when each parallel region beings and ends. Does the
> package call fork()? std::thread{}? Start a local mirai cluster? We
> have to trust (and verify during R CMD check) the package to create the
> given number of units of execution and tells us when they are done.
>
> The closest interface that I see being implementable is a system of
> tokens with reference semantics: getCPUallowance() returns a special
> object containing the number of tokens the caller is allowed to use and
> sets an environment variable with the remaining number of cores. Any R
> child processes pick up the number of cores from the environment
> variable. Any downstream calls to getCPUallowance(), aware of the
> tokens already handed out, return a reduced number of remaining CPU
> cores. Once the package is done executing a parallel section, it
> returns the CPU allowance back to R by calling something like
> close(token), which updates the internal allowance value (and the
> environment variable). (A finalizer can also be set on the tokens to
> ensure that CPU cores won't be lost.)
>
> Here's a package implementing this idea:
> <
> https://urldefense.com/v3/__https://codeberg.org/aitap/R-CPUallowance__;!!IKRxdwAv5BmarQ!YKbTjFJ7y9HrRl3jgCC0ouo8C5sMSCIkKuIWlUjSZz_B2rehUKl6JefgeeuKrP-46slSQW0dO1VOOWVObOy5GQ$
> >. Currently missing are
> terrible hacks to determine the BLAS type at runtime and resolve the
> necessary symbols to set the number of BLAS threads, depending on
> whether it's OpenBLAS, flexiblas, MKL, or something else. Does it feel
> over-engineered? I hope that, even if not a good solution, this would
> let us move towards a unified solution that could just work™ on
> everything ranging from laptops to CRAN testing machines to HPCs.
>
> --
> Best regards,
> Ivan
>
> [1]
> https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-package-devel/2023q3/009484.html__;!!IKRxdwAv5BmarQ!YKbTjFJ7y9HrRl3jgCC0ouo8C5sMSCIkKuIWlUjSZz_B2rehUKl6JefgeeuKrP-46slSQW0dO1VOOWUAUyE0DQ$
>
> [2]
> https://urldefense.com/v3/__https://stat.ethz.ch/pipermail/r-package-devel/2023q3/009513.html__;!!IKRxdwAv5BmarQ!YKbTjFJ7y9HrRl3jgCC0ouo8C5sMSCIkKuIWlUjSZz_B2rehUKl6JefgeeuKrP-46slSQW0dO1VOOWWuNYr1EQ$
>
> ______________________________________________
> R-package-devel@r-project.org mailing list
>
> https://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r-package-devel__;!!IKRxdwAv5BmarQ!YKbTjFJ7y9HrRl3jgCC0ouo8C5sMSCIkKuIWlUjSZz_B2rehUKl6JefgeeuKrP-46slSQW0dO1VOOWXoVH6F-Q$
>

        [[alternative HTML version deleted]]

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Re: [R-pkg-devel] RFC: an interface to manage use of parallelism in packages

Reply via email to