В Wed, 7 Aug 2024 07:47:38 -0400 Dipterix Wang <dipterix.w...@gmail.com> пишет:
> I wonder if R initiates a system environment or options to instruct > the packages on the number of cores to use? A lot of thought and experience with various HPC systems went into availableCores(), a function from the zero-dependency 'parallelly' package by Henrik Bengtsson: https://search.r-project.org/CRAN/refmans/parallelly/html/availableCores.html If you cannot accept a pre-created cluster object or 'future' plan or 'BiocParallel' parameters or the number of OpenMP threads from the user, this must be a safer default than parallel::detectCores(). Building such a limiter into R poses a number of problems. Here is a summary from a previous discussion on R-pkg-devel [1] with wise contributions from Dirk Eddelbuettel, Reed A. Cartwright, Vladimir Dergachev, and Andrew Robbins. - R is responsible for the BLAS it is linked to and therefore must actively manage the BLAS threads when the user sets the thread limit. This requires writing BLAS-specific code to talk to the libraries, like done in FlexiBLAS and the RhpcBLASctl package. Some BLASes (like ATLAS) only have a compile-time thread limit. R should somehow give all threads to BLAS by default but take them away when some other form of parallelism is requested. - Should R be managing the OpenMP thread limit by itself? If not, that's a lot of extra work for every OpenMP-using package developer. If yes, R is now responsible for initialising OpenMP. - Managing the BLAS and OpenMP thread limits is already a hard problem because some BLASes may or may not be following the OpenMP thread limits. - What if two packages both consult the thread limit and create N^2 processes as a result of one calling the other? Dividing a single computer between BLAS threads, OpenMP threads, child processes and their threads needs a very reliable global inter-process semaphore. R would have to grow a jobserver like in GNU Make, a separate process because the main R thread will be blocked waiting for the computation result, especially if we want to automatically recover job slots from crashed processes. That's probably not impossible, but involves a lot of OS-specific code. - What happens with the thread limit when starting remote R processes? It's best to avoid having to set it manually. If multiple people unknowingly start R on a shared server, how to avoid the R instances competing for the CPU (or the ownership of the semaphore)? - It will take a lot of political power to actually make this scheme work. The limiter can only be cooperative (unless you override the clone() syscall and make it fail? I expect everything to crash after that), so it takes one piece of software to unknowingly ignore the limit and break everything. -- Best regards, Ivan [1] https://stat.ethz.ch/pipermail/r-package-devel/2023q4/009956.html ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel