> "There’s a downside not mentioned in the manual that caught and baffled me 
> for a while. I was using all 64 cores of an AWS instance via 
> parallel::mclapply() and doing matrix multiplications in the parallelized 
> function. If the matrices were big enough the linked BLAS or LAPACK would try 
> to use all 64 cores for each multiplication, which meant 64^2 processes or 
> threads in some combination and that was the end of all useful work. I worked 
> around the problem by rewriting the matrix multiply as “colSums(x * t(y))”. 
> It also worked to build R from source, which I guess uses the built-in BLAS 
> and LAPACK."

I believe one can control the number of BLAS threads via the `RhpcBLASctl` 
package: https://cran.r-project.org/package=RhpcBLASctl I’ve definitely used it 
in the other direction, when `betareg` was *not* multiprocessing. 
https://stackoverflow.com/a/66540693/570918

> "Does R build its own BLAS and LAPACK if it's also linking external ones?"

No, it will not. On Conda Forge, there was even some trickery on certain 
platforms (osx-arm64) where external BLAS/LAPACK were used, but symlinks were 
used to fill in the typical R delivered ones (Rblas.dylib, Rlapack.dylib) to 
allow previously built packages using rpath links to support the swap.

BTW, one can easily select the Conda Forge BLAS/LAPACK implementation. It 
doesn't provide the R-vendored ones, but the reference standard is Netlib, 
e.g., `conda install 'blas=*=netlib'`. But that's also the slowest by all 
metrics and on all platforms.

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to