On 23.08.2023 15:58, Jeff Newmiller wrote:
To whom are you addressing this question? The OpenMP developers who define the 
missing-OMP_THREAD_LIMIT behaviour and-or supply default config files? The CRAN 
server administrators who set the variable in their site-wide configuration 
intentionally or unintentionally? Or the package authors expected to kludge in 
settings to override those defaults for CRAN testing while not overriding them 
in normal use?

Of course , the CRAN teams controls the env vars on the CRAN servers, but not on a server a user might use. And a user is typically unaware that a package uses multithreading. R users are typically not developers with a lot of insight in computer science. Most R users I know would not even know how to set an env var.

So why do you ecxpect your users to set an appropriate OMP_THREAD_LIMIT? Particularly when they aim at parallelization, they have to set it to 1. I advocate not only to limit the number of cores for CRAN but also (and inparticular) the default! Something we cannot check easily.


An alternative would be to teach R to set OMP_THREAD_LIMIT=1 locally by default and a mechanism to change that for users.

Best,
Uwe Ligges



I would vote for explicitly addressing this (rhetorical?) question to the CRAN 
server administrators...

On August 23, 2023 6:31:01 AM PDT, Uwe Ligges <lig...@statistik.tu-dortmund.de> 
wrote:
I (any many collegues here) have been caught several times by the following 
example:

1. did something in parallel on a cluster, set up via parallel::makeCluster().
2. e.g. allocated 20 cores and got them on one single machine
3. ran some code in parallel via parLapply()

Bang! 400 threads;
So I have started 20 parallel processes, each of which is using the 
automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted by the 
cluster to 20 (rather than 1).

Hence, I really believe a default should always be small, not only in examples 
and tests, but generally. And people who aim for more should be able to 
increase the defaults.

Do you believe a software that auto-occupies a 96 core machines with 96 threads 
by default is sensible?

Best,
Uwe Ligges






On 21.08.2023 21:59, Berry Boessenkool wrote:

If you add that to each exported function, isn't that a lot of code to read + 
maintain?
Also, it seems like unnecessary computational overhead.
  From a software design point of view, it might be nicer to set that in the 
examples + tests.

Regards,
Berry

________________________________
From: R-package-devel <r-package-devel-boun...@r-project.org> on behalf of Scott 
Ritchie <sritchi...@gmail.com>
Sent: Monday, August 21, 2023 19:23
To: Dirk Eddelbuettel <e...@debian.org>
Cc: r-package-devel@r-project.org <r-package-devel@r-project.org>
Subject: Re: [R-pkg-devel] Trouble with long-running tests on CRAN debian server

Thanks Dirk and Ivan,

I took a slightly different work-around of forcing the number of threads to
1 when running functions of the test dataset in the package, by adding the
following to each user facing function:

```
    # Check if running on package test_data, and if so, force data.table to
be
    # single threaded so that we can avoid a NOTE on CRAN submission
    if (isTRUE(all.equal(x, ukbnmr::test_data))) {
      registered_threads <- getDTthreads()
      setDTthreads(1)
      on.exit({ setDTthreads(registered_threads) }) # re-register so no
unintended side effects for users
    }
```
(i.e. here x is the input argument to the function)

It took some trial and error to get to pass the CRAN tests; the number of
columns in the input data was also contributing to the problem.

Best,

Scott


On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <e...@debian.org> wrote:


On 21 August 2023 at 16:05, Ivan Krylov wrote:
| Dirk is probably right that it's a good idea to have OMP_THREAD_LIMIT=2
| set on the CRAN check machine. Either that, or place the responsibility
| on data.table for setting the right number of threads by default. But
| that's a policy question: should a CRAN package start no more than two
| threads/child processes even if it doesn't know it's running in an
| environment where the CPU time / elapsed time limit is two?

Methinks that given this language in the CRAN Repository Policy

    If running a package uses multiple threads/cores it must never use more
    than two simultaneously: the check farm is a shared resource and will
    typically be running many checks simultaneously.

it would indeed be nice if this variable, and/or equivalent ones, were set.

As I mentioned before, I had long added a similar throttle (not for
data.table) in a package I look after (for work, even). So a similar
throttler with optionality is below. I'll add this to my `dang` package
collecting various functions.

A usage example follows. It does nothing by default, ensuring 'full power'
but reflects the minimum of two possible options, or an explicit count:

      > dang::limitDataTableCores(verbose=TRUE)
      Limiting data.table to '12'.
      > Sys.setenv("OMP_THREAD_LIMIT"=3);
dang::limitDataTableCores(verbose=TRUE)
      Limiting data.table to '3'.
      > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE)
      Limiting data.table to '2'.
      > dang::limitDataTableCores(1, verbose=TRUE)
      Limiting data.table to '1'.
      >

That makes it, in my eyes, preferable to any unconditional 'always pick 1
thread'.

Dirk


##' Set threads for data.table respecting possible local settings
##'
##' This function set the number of threads \pkg{data.table} will use
##' while reflecting two possible machine-specific settings from the
##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R
##' option \sQuote{Ncpus} (uses e.g. for parallel builds).
##' @title Set data.table threads respecting default settingss
##' @param ncores A numeric or character variable with the desired
##' count of threads to use
##' @param verbose A logical value with a default of \sQuote{FALSE} to
##' operate more verbosely
##' @return The return value of the \pkg{data.table} function
##' \code{setDTthreads} which is called as a side-effect.
##' @author Dirk Eddelbuettel
##' @export
limitDataTableCores <- function(ncores, verbose = FALSE) {
      if (missing(ncores)) {
          ## start with a simple fallback: 'Ncpus' (if set) or else 2
          ncores <- getOption("Ncpus", 2L)
          ## also consider OMP_THREAD_LIMIT (cf Writing R Extensions), gets
NA if envvar unset
          ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT"))
          ## and then keep the smaller
          ncores <- min(na.omit(c(ncores, ompcores)))
      }
      stopifnot("Package 'data.table' must be installed." =
requireNamespace("data.table", quietly=TRUE))
      stopifnot("Argument 'ncores' must be numeric or character" =
is.numeric(ncores) || is.character(ncores))
      if (verbose) message("Limiting data.table to '", ncores, "'.")
      data.table::setDTthreads(ncores)
}

|
| --
| Best regards,
| Ivan
|
| ______________________________________________
| R-package-devel@r-project.org mailing list
| https://stat.ethz.ch/mailman/listinfo/r-package-devel

--
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org


          [[alternative HTML version deleted]]

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

        [[alternative HTML version deleted]]

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to