> "There’s a downside not mentioned in the manual that caught and baffled me > for a while. I was using all 64 cores of an AWS instance via > parallel::mclapply() and doing matrix multiplications in the parallelized > function. If the matrices were big enough the linked BLAS or LAPACK would try > to use all 64 cores for each multiplication, which meant 64^2 processes or > threads in some combination and that was the end of all useful work. I worked > around the problem by rewriting the matrix multiply as “colSums(x * t(y))”. > It also worked to build R from source, which I guess uses the built-in BLAS > and LAPACK."
I believe one can control the number of BLAS threads via the `RhpcBLASctl` package: https://cran.r-project.org/package=RhpcBLASctl I’ve definitely used it in the other direction, when `betareg` was *not* multiprocessing. https://stackoverflow.com/a/66540693/570918 > "Does R build its own BLAS and LAPACK if it's also linking external ones?" No, it will not. On Conda Forge, there was even some trickery on certain platforms (osx-arm64) where external BLAS/LAPACK were used, but symlinks were used to fill in the typical R delivered ones (Rblas.dylib, Rlapack.dylib) to allow previously built packages using rpath links to support the swap. BTW, one can easily select the Conda Forge BLAS/LAPACK implementation. It doesn't provide the R-vendored ones, but the reference standard is Netlib, e.g., `conda install 'blas=*=netlib'`. But that's also the slowest by all metrics and on all platforms. [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel