On 23.08.2023 16:00, Scott Ritchie wrote:
Hi Uwe,

I agree and have also been burnt myself by programs occupying the maximum number of cores available.

My understanding is that in the absence of explicit parallelisation, use of data.table in a package should not lead to this type of behaviour?

Yes, that would be my hope, too.

Best,
Uwe Ligges



Best,

Scott

On Wed, 23 Aug 2023 at 14:30, Uwe Ligges <lig...@statistik.tu-dortmund.de <mailto:lig...@statistik.tu-dortmund.de>> wrote:

    I (any many collegues here) have been caught several times by the
    following example:

    1. did something in parallel on a cluster, set up via
    parallel::makeCluster().
    2. e.g. allocated 20 cores and got them on one single machine
    3. ran some code in parallel via parLapply()

    Bang! 400 threads;
    So I have started 20 parallel processes, each of which is using the
    automatically set max. 20 threads as OMP_THREAD_LIMIT was also adjusted
    by the cluster to 20 (rather than 1).

    Hence, I really believe a default should always be small, not only in
    examples and tests, but generally. And people who aim for more
    should be
    able to increase the defaults.

    Do you believe a software that auto-occupies a 96 core machines with 96
    threads by default is sensible?

    Best,
    Uwe Ligges






    On 21.08.2023 21:59, Berry Boessenkool wrote:
     >
     > If you add that to each exported function, isn't that a lot of
    code to read + maintain?
     > Also, it seems like unnecessary computational overhead.
     >  From a software design point of view, it might be nicer to set
    that in the examples + tests.
     >
     > Regards,
     > Berry
     >
     > ________________________________
     > From: R-package-devel <r-package-devel-boun...@r-project.org
    <mailto:r-package-devel-boun...@r-project.org>> on behalf of Scott
    Ritchie <sritchi...@gmail.com <mailto:sritchi...@gmail.com>>
     > Sent: Monday, August 21, 2023 19:23
     > To: Dirk Eddelbuettel <e...@debian.org <mailto:e...@debian.org>>
     > Cc: r-package-devel@r-project.org
    <mailto:r-package-devel@r-project.org>
    <r-package-devel@r-project.org <mailto:r-package-devel@r-project.org>>
     > Subject: Re: [R-pkg-devel] Trouble with long-running tests on
    CRAN debian server
     >
     > Thanks Dirk and Ivan,
     >
     > I took a slightly different work-around of forcing the number of
    threads to
     > 1 when running functions of the test dataset in the package, by
    adding the
     > following to each user facing function:
     >
     > ```
     >    # Check if running on package test_data, and if so, force
    data.table to
     > be
     >    # single threaded so that we can avoid a NOTE on CRAN submission
     >    if (isTRUE(all.equal(x, ukbnmr::test_data))) {
     >      registered_threads <- getDTthreads()
     >      setDTthreads(1)
     >      on.exit({ setDTthreads(registered_threads) }) # re-register
    so no
     > unintended side effects for users
     >    }
     > ```
     > (i.e. here x is the input argument to the function)
     >
     > It took some trial and error to get to pass the CRAN tests; the
    number of
     > columns in the input data was also contributing to the problem.
     >
     > Best,
     >
     > Scott
     >
     >
     > On Mon, 21 Aug 2023 at 14:38, Dirk Eddelbuettel <e...@debian.org
    <mailto:e...@debian.org>> wrote:
     >
     >>
     >> On 21 August 2023 at 16:05, Ivan Krylov wrote:
     >> | Dirk is probably right that it's a good idea to have
    OMP_THREAD_LIMIT=2
     >> | set on the CRAN check machine. Either that, or place the
    responsibility
     >> | on data.table for setting the right number of threads by
    default. But
     >> | that's a policy question: should a CRAN package start no more
    than two
     >> | threads/child processes even if it doesn't know it's running in an
     >> | environment where the CPU time / elapsed time limit is two?
     >>
     >> Methinks that given this language in the CRAN Repository Policy
     >>
     >>    If running a package uses multiple threads/cores it must
    never use more
     >>    than two simultaneously: the check farm is a shared resource
    and will
     >>    typically be running many checks simultaneously.
     >>
     >> it would indeed be nice if this variable, and/or equivalent
    ones, were set.
     >>
     >> As I mentioned before, I had long added a similar throttle (not for
     >> data.table) in a package I look after (for work, even). So a similar
     >> throttler with optionality is below. I'll add this to my `dang`
    package
     >> collecting various functions.
     >>
     >> A usage example follows. It does nothing by default, ensuring
    'full power'
     >> but reflects the minimum of two possible options, or an explicit
    count:
     >>
     >>      > dang::limitDataTableCores(verbose=TRUE)
     >>      Limiting data.table to '12'.
     >>      > Sys.setenv("OMP_THREAD_LIMIT"=3);
     >> dang::limitDataTableCores(verbose=TRUE)
     >>      Limiting data.table to '3'.
     >>      > options(Ncpus=2); dang::limitDataTableCores(verbose=TRUE)
     >>      Limiting data.table to '2'.
     >>      > dang::limitDataTableCores(1, verbose=TRUE)
     >>      Limiting data.table to '1'.
     >>      >
     >>
     >> That makes it, in my eyes, preferable to any unconditional
    'always pick 1
     >> thread'.
     >>
     >> Dirk
     >>
     >>
     >> ##' Set threads for data.table respecting possible local settings
     >> ##'
     >> ##' This function set the number of threads \pkg{data.table}
    will use
     >> ##' while reflecting two possible machine-specific settings from the
     >> ##' environment variable \sQuote{OMP_THREAD_LIMIT} as well as the R
     >> ##' option \sQuote{Ncpus} (uses e.g. for parallel builds).
     >> ##' @title Set data.table threads respecting default settingss
     >> ##' @param ncores A numeric or character variable with the desired
     >> ##' count of threads to use
     >> ##' @param verbose A logical value with a default of
    \sQuote{FALSE} to
     >> ##' operate more verbosely
     >> ##' @return The return value of the \pkg{data.table} function
     >> ##' \code{setDTthreads} which is called as a side-effect.
     >> ##' @author Dirk Eddelbuettel
     >> ##' @export
     >> limitDataTableCores <- function(ncores, verbose = FALSE) {
     >>      if (missing(ncores)) {
     >>          ## start with a simple fallback: 'Ncpus' (if set) or else 2
     >>          ncores <- getOption("Ncpus", 2L)
     >>          ## also consider OMP_THREAD_LIMIT (cf Writing R
    Extensions), gets
     >> NA if envvar unset
     >>          ompcores <- as.integer(Sys.getenv("OMP_THREAD_LIMIT"))
     >>          ## and then keep the smaller
     >>          ncores <- min(na.omit(c(ncores, ompcores)))
     >>      }
     >>      stopifnot("Package 'data.table' must be installed." =
     >> requireNamespace("data.table", quietly=TRUE))
     >>      stopifnot("Argument 'ncores' must be numeric or character" =
     >> is.numeric(ncores) || is.character(ncores))
     >>      if (verbose) message("Limiting data.table to '", ncores, "'.")
     >>      data.table::setDTthreads(ncores)
     >> }
     >>
     >> |
     >> | --
     >> | Best regards,
     >> | Ivan
     >> |
     >> | ______________________________________________
     >> | R-package-devel@r-project.org
    <mailto:R-package-devel@r-project.org> mailing list
     >> | https://stat.ethz.ch/mailman/listinfo/r-package-devel
    <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
     >>
     >> --
     >> dirk.eddelbuettel.com <http://dirk.eddelbuettel.com> |
    @eddelbuettel | e...@debian.org <mailto:e...@debian.org>
     >>
     >
     >          [[alternative HTML version deleted]]
     >
     > ______________________________________________
     > R-package-devel@r-project.org
    <mailto:R-package-devel@r-project.org> mailing list
     > https://stat.ethz.ch/mailman/listinfo/r-package-devel
    <https://stat.ethz.ch/mailman/listinfo/r-package-devel>
     >
     >       [[alternative HTML version deleted]]
     >
     > ______________________________________________
     > R-package-devel@r-project.org
    <mailto:R-package-devel@r-project.org> mailing list
     > https://stat.ethz.ch/mailman/listinfo/r-package-devel
    <https://stat.ethz.ch/mailman/listinfo/r-package-devel>


______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to