To whom are you addressing this question? The OpenMP developers who define the missing-OMP_THREAD_LIMIT behaviour and-or supply default config files? The CRAN server administrators who set the variable in their site-wide configuration intentionally or unintentionally? Or the package authors expected to kludge in settings to override those defaults for CRAN testing while not overriding them in normal use?
I would vote for explicitly addressing this (rhetorical?) question to the CRAN server administrators... On August 23, 2023 6:31:01 AM PDT, Uwe Ligges <lig...@statistik.tu-dortmund.de> wrote: >I (any many collegues here) have been caught several times by the following >example: > >1. did something in parallel on a cluster, set up via parallel::makeCluster(). >2. e.g. allocated 20 cores and got them on one single machine >3. ran some code in parallel via parLapply() > >Bang! 400 threads; >So I have started 20 parallel processes, each of which is using the >automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted by the >cluster to 20 (rather than 1). > >Hence, I really believe a default should always be small, not only in examples >and tests, but generally. And people who aim for more should be able to >increase the defaults. > >Do you believe a software that auto-occupies a 96 core machines with 96 >threads by default is sensible? > >Best, >Uwe Ligges > > > > > > >On 21.08.2023 21:59, Berry Boessenkool wrote: >> >> If you add that to each exported function, isn't that a lot of code to read >> + maintain? >> Also, it seems like unnecessary computational overhead. >> From a software design point of view, it might be nicer to set that in the >> examples + tests. >> >> Regards, >> Berry >> >> ________________________________ >> From: R-package-devel <r-package-devel-boun...@r-project.org> on behalf of >> Scott Ritchie <sritchi...@gmail.com> >> Sent: Monday, August 21, 2023 19:23 >> To: Dirk Eddelbuettel <e...@debian.org> >> Cc: r-package-devel@r-project.org <r-package-devel@r-project.org> >> Subject: Re: [R-pkg-devel] Trouble with long-running tests on CRAN debian >> server >> >> Thanks Dirk and Ivan, >> >> I took a slightly different work-around of forcing the number of threads to >> 1 when running functions of the test dataset in the package, by adding the >> following to each user facing function: >> >> ``` >> # Check if running on package test_data, and if so, force data.table to >> be >> # single threaded so that we can avoid a NOTE on CRAN submission >> if (isTRUE(all.equal(x, ukbnmr::test_data))) { >> registered_threads <- getDTthreads() >> setDTthreads(1) >> on.exit({ setDTthreads(registered_threads) }) # re-register so no >> unintended side effects for users >> } >> ``` >> (i.e. here x is the input argument to the function) >> >> It took some trial and error to get to pass the CRAN tests; the number of >> columns in the input data was also contributing to the problem. >> >> Best, >> >> Scott >> >> >> On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <e...@debian.org> wrote: >> >>> >>> On 21 August 2023 at 16:05, Ivan Krylov wrote: >>> | Dirk is probably right that it's a good idea to have OMP_THREAD_LIMIT=2 >>> | set on the CRAN check machine. Either that, or place the responsibility >>> | on data.table for setting the right number of threads by default. But >>> | that's a policy question: should a CRAN package start no more than two >>> | threads/child processes even if it doesn't know it's running in an >>> | environment where the CPU time / elapsed time limit is two? >>> >>> Methinks that given this language in the CRAN Repository Policy >>> >>> If running a package uses multiple threads/cores it must never use more >>> than two simultaneously: the check farm is a shared resource and will >>> typically be running many checks simultaneously. >>> >>> it would indeed be nice if this variable, and/or equivalent ones, were set. >>> >>> As I mentioned before, I had long added a similar throttle (not for >>> data.table) in a package I look after (for work, even). So a similar >>> throttler with optionality is below. I'll add this to my `dang` package >>> collecting various functions. >>> >>> A usage example follows. It does nothing by default, ensuring 'full power' >>> but reflects the minimum of two possible options, or an explicit count: >>> >>> > dang::limitDataTableCores(verbose=TRUE) >>> Limiting data.table to '12'. >>> > Sys.setenv("OMP_THREAD_LIMIT"=3); >>> dang::limitDataTableCores(verbose=TRUE) >>> Limiting data.table to '3'. >>> > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE) >>> Limiting data.table to '2'. >>> > dang::limitDataTableCores(1, verbose=TRUE) >>> Limiting data.table to '1'. >>> > >>> >>> That makes it, in my eyes, preferable to any unconditional 'always pick 1 >>> thread'. >>> >>> Dirk >>> >>> >>> ##' Set threads for data.table respecting possible local settings >>> ##' >>> ##' This function set the number of threads \pkg{data.table} will use >>> ##' while reflecting two possible machine-specific settings from the >>> ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R >>> ##' option \sQuote{Ncpus} (uses e.g. for parallel builds). >>> ##' @title Set data.table threads respecting default settingss >>> ##' @param ncores A numeric or character variable with the desired >>> ##' count of threads to use >>> ##' @param verbose A logical value with a default of \sQuote{FALSE} to >>> ##' operate more verbosely >>> ##' @return The return value of the \pkg{data.table} function >>> ##' \code{setDTthreads} which is called as a side-effect. >>> ##' @author Dirk Eddelbuettel >>> ##' @export >>> limitDataTableCores <- function(ncores, verbose = FALSE) { >>> if (missing(ncores)) { >>> ## start with a simple fallback: 'Ncpus' (if set) or else 2 >>> ncores <- getOption("Ncpus", 2L) >>> ## also consider OMP_THREAD_LIMIT (cf Writing R Extensions), gets >>> NA if envvar unset >>> ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT")) >>> ## and then keep the smaller >>> ncores <- min(na.omit(c(ncores, ompcores))) >>> } >>> stopifnot("Package 'data.table' must be installed." = >>> requireNamespace("data.table", quietly=TRUE)) >>> stopifnot("Argument 'ncores' must be numeric or character" = >>> is.numeric(ncores) || is.character(ncores)) >>> if (verbose) message("Limiting data.table to '", ncores, "'.") >>> data.table::setDTthreads(ncores) >>> } >>> >>> | >>> | -- >>> | Best regards, >>> | Ivan >>> | >>> | ______________________________________________ >>> | R-package-devel@r-project.org mailing list >>> | https://stat.ethz.ch/mailman/listinfo/r-package-devel >>> >>> -- >>> dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org >>> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-package-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-package-devel > >______________________________________________ >R-package-devel@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-package-devel -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel