On 23 February 2024 at 09:35, Robin Liu wrote:
| Hi all,
| 
| Here is an R script that uses Armadillo to decompose a large matrix and print
| the first 10 eigenvalues.
| 
| library(RcppArmadillo)
| library(Rcpp)
| 
| src <-
| r"(#include <RcppArmadillo.h>
| 
| // [[Rcpp::depends(RcppArmadillo)]]
| 
| // [[Rcpp::export]]
| arma::vec getEigenValues(arma::mat M) {
|   return arma::eig_sym(M);
| })"
| 
| size <- 10000
| m <- matrix(rnorm(size^2), size, size)
| m <- m * t(m)
| 
| # This line compiles the above code with the -fopenmp flag.
| sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)
| result <- getEigenValues(m)
| print(result[1:10])
| 
| When I run this code on server A, I see that arma can implicitly leverage all
| available cores by running top -H. However, on server B it can only use one
| core despite multiple being available: there is just one process entry in top
| -H. Both processes successfully exit and return an answer. The process on
| server B is of course much slower.

It is documented in the package how this is applied and the policy is to NOT
blindly enforce one use case (say all cores, or half, or a magically chosen
value of N for whatever value of N) but to follow the local admin setting and
respecting standard environment variables.

So I suspect that your machine 'B' differs from machine 'A' in this regards.

Not that this is a _run-time_ and not _compile-time_ behavior. As it is for
multicore-enabled LAPACK and BLAS libraries, the OpenMP library and basically
most software of this type.

You can override it, see
  RcppArmadillo::armadillo_set_number_of_omp_threads
  RcppArmadillo::armadillo_get_number_of_omp_threads

Can you try and see if these help you?

Dirk

| Here is the compilation on server A:
| /usr/local/lib/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so'
| 'file197c21cbec564.cpp'
| g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I../inst/include
| -fopenmp  -I"/usr/local/lib/R/site-library/Rcpp/include" -I"/usr/local/lib/R/
| site-library/RcppArmadillo/include" -I"/tmp/RtmpwhGRi3/
| sourceCpp-x86_64-pc-linux-gnu-1.0.9" -I/usr/local/include   -fpic  -g -O2
| -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time
| -D_FORTIFY_SOURCE=2 -g  -c file197c21cbec564.cpp -o file197c21cbec564.o
| g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o
| sourceCpp_2.so file197c21cbec564.o -fopenmp -llapack -lblas -lgfortran -lm
| -lquadmath -L/usr/local/lib/R/lib -lR
| 
| and here it is for server B:
| /sw/R/R-4.2.3/lib64/R/bin/R CMD SHLIB --preclean -o 'sourceCpp_2.so'
| 'file158165b9c4ae1.cpp'
| g++ -std=gnu++11 -I"/sw/R/R-4.2.3/lib64/R/include" -DNDEBUG -I../inst/include
| -fopenmp  -I"/home/my_username/.R/library/Rcpp/include" -I"/home/ my_username
| /.R/library/RcppArmadillo/include" -I"/tmp/RtmpvfPt4l/
| sourceCpp-x86_64-pc-linux-gnu-1.0.10" -I/usr/local/include   -fpic  -g -O2  -c
| file158165b9c4ae1.cpp -o file158165b9c4ae1.o
| g++ -std=gnu++11 -shared -L/sw/R/R-4.2.3/lib64/R/lib -L/usr/local/lib64 -o
| sourceCpp_2.so file158165b9c4ae1.o -fopenmp -llapack -lblas -lgfortran -lm
| -lquadmath -L/sw/R/R-4.2.3/lib64/R/lib -lR
| 
| I thought that the -fopenmp flag should let arma implicitly parallelize matrix
| computations. Any hints as to why this may not work on server B?
| 
| The actual code I'm running is an R package that includes RcppArmadillo and
| RcppEnsmallen. Server B is the login node to an hpc cluster, but the code does
| not use all cores on the compute nodes either.
| 
| Best,
| Robin
| _______________________________________________
| Rcpp-devel mailing list
| Rcpp-devel@lists.r-forge.r-project.org
| https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
_______________________________________________
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Reply via email to