[Rcpp-devel] segfault on exit CRAN+Intel only

2024-03-02 Thread Murray Efford
Hi

A couple of days ago I posted on R-package-devel about a mysterious
segfault from R CMD checks of my package secrdesign (see
https://CRAN.R-project.org/package=secrdesign, and
https://github.com/MurrayEfford/secrdesign) The issue rises only on
CRAN and only with the Intel(R) oneAPI DPC++/C++ Compiler:

 *** caught segfault ***
address (nil), cause 'unknown'

As noted by Ivan Krylov and Uwe Ligges, the fault happens at the end
of the R session (as it quits()). The package passes when checked on
Intel(R) oneAPI DPC++/C++ Compiler 2023.2.0 (2023.2.0.20230721) with
rhub2 .

Now, CRAN via Uwe Ligges has accepted a new version of secrdesign
despite the continuing error. My reason for raising it here is that
(i) it is likely to raise its head next time I update,
(ii) my experience may not be unique,
(iii) my use of Rcpp, RcppArmadillo and BH in this package is very
limited (https://github.com/MurrayEfford/secrdesign/tree/main/src),
and it may therefore be provide clues to an Rcpp pro.
(iv) I have just noticed a similar 'Additional issue' for
https://CRAN.R-project.org/package=ipsecr that also uses Rcpp,
RcppArmadillo and BH.
Any advice would be welcome. I have no experience with docker, so
answers in words of one or few syllables, please.
Murray
___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


Re: [Rcpp-devel] RcppArmadillo with -fopenmp: Not using all available cores

2024-03-02 Thread Dirk Eddelbuettel

Hi Robin,

On 2 March 2024 at 16:34, Robin Liu wrote:
| sessionInfo() was the right clue. Indeed the version of R on machine B was not
| linked to OpenBLAS. Switching to a version with OpenBLAS allows the test code
| to use all cores.
| 
| A clear way to check which library is linked is to run the following:
| 
| > extSoftVersion()["BLAS"]

Ah yes -- I keep forgetting about that one. Good reminder!
 
| Thanks for your help!

Always a pleasure. Glad you are all set.

Dirk

 
| On Sat, Feb 24, 2024 at 9:17 AM Dirk Eddelbuettel  wrote:
| 
| 
| On 24 February 2024 at 11:44, Robin Liu wrote:
| | Thank you Dirk for the response.
| |
| | I called RcppArmadillo::armadillo_get_number_of_omp_threads() on both
| machines
| | and correctly see that machine A and B have 20 and 40 cores,
| respectively. I
| | also see that calling the setter changes this value.
| |
| | However, calling the setter does not seem to change the number of cores
| used on
| | either machine A or B. I have updated my code example as below: the
| execution
| | uses 20 cores on machine A and 1 core on machine B as before, despite my
| | setting the number of omp threads to 5. Do you have any further hints?
| 
| I fear you need to debug that on the machine 'B' in question. It's all 
open
| source.  I do not think either Conrad or myself put code in to constrain
| you
| to one core on 'B' (and then doesn't as you see on 'A').
| 
| You can grep around both the RcppArmadillo wrapper code and the include
| Armadillo code, I suggest making a local copy and peppering in some print
| statements.
| 
| Also keep in mind that (Rcpp)Armadillo hands off to computation to the
| actual
| LAPACK / BLAS implementation on that machine. Lots of things can go wrong
| there: maybe R was compiled with its own embedded BLAS/LAPACK sources
| (preventing a call out to OpenBLAS even when the machine has it). Or maybe
| R
| was compiled correctly but a single-threaded set of libraries is on the
| machine.
| 
| You have not supplied any of that information. Many bug report suggestions
| hint that showing `sessionInfo()` helps -- and it does show the 
BLAS/LAPACK
| libraries. You are not forced to show us this, but by not showing us you
| prevent us from being more focussed on suggestions.  So maybe start at 
your
| end by glancing at sessionInfo() on A and B?
| 
| Dirk
| 
| 
| | library(RcppArmadillo)
| | library(Rcpp)
| |
| | RcppArmadillo::armadillo_set_number_of_omp_threads(5)
| | print(sprintf("There are %d threads",
| |       RcppArmadillo::armadillo_get_number_of_omp_threads()))
| |
| | src <-
| | r"(#include 
| |
| | // [[Rcpp::depends(RcppArmadillo)]]
| |
| | // [[Rcpp::export]]
| | arma::vec getEigenValues(arma::mat M) {
| |   return arma::eig_sym(M);
| | })"
| |
| | size <- 1
| | m <- matrix(rnorm(size^2), size, size)
| | m <- m * t(m)
| |
| | # This line compiles the above code with the -fopenmp flag.
| | sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)
| | result <- getEigenValues(m)
| | print(result[1:10])
| |
| | On Fri, Feb 23, 2024 at 12:53 PM Dirk Eddelbuettel 
| wrote:
| |
| |
| |     On 23 February 2024 at 09:35, Robin Liu wrote:
| |     | Hi all,
| |     |
| |     | Here is an R script that uses Armadillo to decompose a large 
matrix
| and
| |     print
| |     | the first 10 eigenvalues.
| |     |
| |     | library(RcppArmadillo)
| |     | library(Rcpp)
| |     |
| |     | src <-
| |     | r"(#include 
| |     |
| |     | // [[Rcpp::depends(RcppArmadillo)]]
| |     |
| |     | // [[Rcpp::export]]
| |     | arma::vec getEigenValues(arma::mat M) {
| |     |   return arma::eig_sym(M);
| |     | })"
| |     |
| |     | size <- 1
| |     | m <- matrix(rnorm(size^2), size, size)
| |     | m <- m * t(m)
| |     |
| |     | # This line compiles the above code with the -fopenmp flag.
| |     | sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)
| |     | result <- getEigenValues(m)
| |     | print(result[1:10])
| |     |
| |     | When I run this code on server A, I see that arma can implicitly
| leverage
| |     all
| |     | available cores by running top -H. However, on server B it can 
only
| use
| |     one
| |     | core despite multiple being available: there is just one process
| entry in
| |     top
| |     | -H. Both processes successfully exit and return an answer. The
| process on
| |     | server B is of course much slower.
| |
| |     It is documented in the package how this is applied and the policy 
is
| to
| |     NOT
| |     blindly enforce one use case (say all cores, or half, or a 

Re: [Rcpp-devel] Segfault in wrapping code in Rcpp

2024-03-02 Thread Dirk Eddelbuettel


Hi Nikhil,

Don't post images. I read in a text-based reader. The mailing list software
also scrubs html (I think).

I would simplify. Start with the simplest Rcpp Modules setup. Then add. Check
checking. Eventually on your way towards what you are doing now you may spot
the error.

Hope this helps,  Dirk

-- 
dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel


Re: [Rcpp-devel] RcppArmadillo with -fopenmp: Not using all available cores

2024-03-02 Thread Robin Liu
Hi Dirk,

sessionInfo() was the right clue. Indeed the version of R on machine B was
not linked to OpenBLAS. Switching to a version with OpenBLAS allows the
test code to use all cores.

A clear way to check which library is linked is to run the following:

> extSoftVersion()["BLAS"]

Thanks for your help!

On Sat, Feb 24, 2024 at 9:17 AM Dirk Eddelbuettel  wrote:

>
> On 24 February 2024 at 11:44, Robin Liu wrote:
> | Thank you Dirk for the response.
> |
> | I called RcppArmadillo::armadillo_get_number_of_omp_threads() on both
> machines
> | and correctly see that machine A and B have 20 and 40 cores,
> respectively. I
> | also see that calling the setter changes this value.
> |
> | However, calling the setter does not seem to change the number of cores
> used on
> | either machine A or B. I have updated my code example as below: the
> execution
> | uses 20 cores on machine A and 1 core on machine B as before, despite my
> | setting the number of omp threads to 5. Do you have any further hints?
>
> I fear you need to debug that on the machine 'B' in question. It's all open
> source.  I do not think either Conrad or myself put code in to constrain
> you
> to one core on 'B' (and then doesn't as you see on 'A').
>
> You can grep around both the RcppArmadillo wrapper code and the include
> Armadillo code, I suggest making a local copy and peppering in some print
> statements.
>
> Also keep in mind that (Rcpp)Armadillo hands off to computation to the
> actual
> LAPACK / BLAS implementation on that machine. Lots of things can go wrong
> there: maybe R was compiled with its own embedded BLAS/LAPACK sources
> (preventing a call out to OpenBLAS even when the machine has it). Or maybe
> R
> was compiled correctly but a single-threaded set of libraries is on the
> machine.
>
> You have not supplied any of that information. Many bug report suggestions
> hint that showing `sessionInfo()` helps -- and it does show the BLAS/LAPACK
> libraries. You are not forced to show us this, but by not showing us you
> prevent us from being more focussed on suggestions.  So maybe start at your
> end by glancing at sessionInfo() on A and B?
>
> Dirk
>
>
> | library(RcppArmadillo)
> | library(Rcpp)
> |
> | RcppArmadillo::armadillo_set_number_of_omp_threads(5)
> | print(sprintf("There are %d threads",
> |   RcppArmadillo::armadillo_get_number_of_omp_threads()))
> |
> | src <-
> | r"(#include 
> |
> | // [[Rcpp::depends(RcppArmadillo)]]
> |
> | // [[Rcpp::export]]
> | arma::vec getEigenValues(arma::mat M) {
> |   return arma::eig_sym(M);
> | })"
> |
> | size <- 1
> | m <- matrix(rnorm(size^2), size, size)
> | m <- m * t(m)
> |
> | # This line compiles the above code with the -fopenmp flag.
> | sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)
> | result <- getEigenValues(m)
> | print(result[1:10])
> |
> | On Fri, Feb 23, 2024 at 12:53 PM Dirk Eddelbuettel 
> wrote:
> |
> |
> | On 23 February 2024 at 09:35, Robin Liu wrote:
> | | Hi all,
> | |
> | | Here is an R script that uses Armadillo to decompose a large
> matrix and
> | print
> | | the first 10 eigenvalues.
> | |
> | | library(RcppArmadillo)
> | | library(Rcpp)
> | |
> | | src <-
> | | r"(#include 
> | |
> | | // [[Rcpp::depends(RcppArmadillo)]]
> | |
> | | // [[Rcpp::export]]
> | | arma::vec getEigenValues(arma::mat M) {
> | |   return arma::eig_sym(M);
> | | })"
> | |
> | | size <- 1
> | | m <- matrix(rnorm(size^2), size, size)
> | | m <- m * t(m)
> | |
> | | # This line compiles the above code with the -fopenmp flag.
> | | sourceCpp(code = src, verbose = TRUE, rebuild = TRUE)
> | | result <- getEigenValues(m)
> | | print(result[1:10])
> | |
> | | When I run this code on server A, I see that arma can implicitly
> leverage
> | all
> | | available cores by running top -H. However, on server B it can
> only use
> | one
> | | core despite multiple being available: there is just one process
> entry in
> | top
> | | -H. Both processes successfully exit and return an answer. The
> process on
> | | server B is of course much slower.
> |
> | It is documented in the package how this is applied and the policy
> is to
> | NOT
> | blindly enforce one use case (say all cores, or half, or a magically
> chosen
> | value of N for whatever value of N) but to follow the local admin
> setting
> | and
> | respecting standard environment variables.
> |
> | So I suspect that your machine 'B' differs from machine 'A' in this
> | regards.
> |
> | Not that this is a _run-time_ and not _compile-time_ behavior. As it
> is for
> | multicore-enabled LAPACK and BLAS libraries, the OpenMP library and
> | basically
> | most software of this type.
> |
> | You can override it, see
> |   RcppArmadillo::armadillo_set_number_of_omp_threads
> |   RcppArmadillo::armadillo_get_number_of_omp_threads