Re: [Rcpp-devel] Detecting Singular Matrices in RcppArmadillo
Thanks for the advice Douglas. Just using B = solve(X,Y) works and it outputs a warning that an approximate solution is attempted when the matrix is nearly singular. To suppress the warning I've just included #define ARMA_WARN_LEVEL 1 at the top of my cpp code. Also, no longer getting some odd outliers that I was previously! For documentation purposes, would it still be accurate to say "parameters are fit using ordinary least squares"? Thanks, Alex On Sat, Jan 29, 2022 at 12:05 PM Douglas Bates wrote: > If you are concerned about singularity it would be better to use a QR or > singular value decomposition of X because the condition number of X'X is > the square of the condition number of X. Just the act of forming X'X from X > makes the conditioning much worse. > > I believe that arma::solve, as used by > https://github.com/RcppCore/RcppArmadillo/blob/master/src/fastLm.cpp, > uses a QR decomposition. > > In general it is much more stable to work with a decomposition of X than > to form X'X and work with it. > > Also, even if the formula for a calculation involves the inverse of a > matrix, it is rarely a good idea to evaluate the full inverse. Most of the > time, evaluating the inverse is much more work than is required to solve a > single linear system of equations. It is like saying that evaluating a > scalar quotient like a/b requires that you first evaluate the inverse of b, > 1/b, then form the product a * (1/b). In fact, it is worse because > evaluating the inverse of an n by n matrix A takes roughly the same amount > of work as solving n systems of equations of the form Ax = b. > > On Fri, Jan 28, 2022 at 6:56 PM wrote: > >> Thanks! I'll use if(rcf <= std::numeric_limits::epsilon()) >> instead of rcf==0. >> >> On Fri, Jan 28, 2022 at 7:32 PM Dirk Eddelbuettel wrote: >> >>> >>> On 28 January 2022 at 13:25, Alex Ilich wrote: >>> | Thank you Neal and Dirk. Based on these comments I think using >>> arma::rcond >>> >>> Ah, yes, of course. Should have remembered arma::rcond. >>> >>> | to get the reciprocal of the condition factor, and checking if that is >>> zero >>> | rather than the determinant may be the route to go. At least for this >>> | example that works. >>> | >>> | library(Rcpp) >>> | >>> | cppFunction(depends="RcppArmadillo", "NumericVector C_OLS(arma::mat X, >>> | arma::mat Y){ >>> | arma::mat Xt = trans(X); >>> | arma::mat XtX = Xt * X; >>> | double rcf = rcond(XtX); >>> | Rcout << rcf; //print reciprocal of condition factor >>> | if(rcf==0){ >>> >>> Re-read Neal's first email. You probably do not want zero (apart from >>> all >>> the R FAQ 7.31 reasons to not compare a double with equality :wink: ) >>> >>> Dirk >>> >>> | NumericVector B2(X.n_cols, NA_REAL); >>> | return B2; >>> | } else{ >>> | arma::mat XtX_inv= inv(XtX); >>> | NumericVector B = Rcpp::as(wrap(XtX_inv * (Xt >>> * >>> | Y))); >>> | return B; >>> | }}") >>> | >>> | X<- matrix(data=c(-250, -250, -250, -250, -250, -250, 250, 200, 150, >>> | 100, 50, 0, 1, 1, 1, 1, 1, 1), ncol=3, nrow=6) >>> | >>> | >>> | X2<- X >>> | >>> | X2[1,2]<- X[1,2] + 5.684342e-14 #Add a slight numeric precision issue >>> | >>> | Y<- matrix(1:6, ncol=1) >>> | >>> | C_OLS(X,Y) >>> | # 0[1] NA NA NA >>> | >>> | C_OLS(X2,Y) >>> | # 0[1] NA NA NA >>> | >>> | >>> | >>> | On Fri, Jan 28, 2022 at 11:53 AM Alex Ilich >>> wrote: >>> | >>> | > Hi, I have some code I've written to calculate regression parameters >>> using >>> | > ordinarily least squares using RcppArmadillo. This is meant to be >>> performed >>> | > iteratively over many small subsets (sliding window over spatial >>> raster >>> | > data) so I'd like to automatically detect when the regression can't >>> be >>> | > performed and just return a vector of NA's. Currently I check to see >>> if the >>> | > determinant is 0 in order to see if the XtX matrix can be inverted >>> and it >>> | > seems to work in the vast majority of cases but I did run into a >>> case where >>> | > a small amount of precision error was introduced and for some reason >>> this >>> | > causes the determinant to be non-zero in C++ however, the matrix >>> still is >>> | > considered singular (uninvertible) leading to an error. I was >>> wondering if >>> | > anyone had any suggestions? Maybe there's a minimum value non-zero >>> value >>> | > where it's considered essentially singular that I could use instead >>> of 0, >>> | > or maybe there's a way to return a vector of NAs if an error is >>> detected >>> | > rather than stopping execution and printing an error message. I've >>> also >>> | > seen that pinv can be used, but from some tests I ran it's >>> substantially >>> | > slower, so I'd like to avoid using that. I've included some example >>> code >>> | > below. Any suggestions would be greatly appreciated. >>> | > >>> | > Thanks, >>> | > Alex >>> | > >>> | > library(Rcpp) >>> | > cppFunction(depends="RcppArmadillo", "NumericVector C_OLS(arma::mat >>> X, >>> | > arma:
Re: [Rcpp-devel] Detecting Singular Matrices in RcppArmadillo
Yes, quoting the paper on which ‘solve’ is based on ( http://arma.sourceforge.net/armadillo_solver_2020.pdf): “The SVD-based solver uses the xGELSD set of functions, which find a minimum-norm solution to a linear least squares problem.” On Mon, Jan 31, 2022 at 12:12 AM Alex Ilich wrote: > Thanks for the advice Douglas. Just using B = solve(X,Y) works and it > outputs a warning that an approximate solution is attempted when the matrix > is nearly singular. To suppress the warning I've just included #define > ARMA_WARN_LEVEL 1 at the top of my cpp code. Also, no longer getting some > odd outliers that I was previously! For documentation purposes, would it > still be accurate to say "parameters are fit using ordinary least squares"? > > Thanks, > Alex > > On Sat, Jan 29, 2022 at 12:05 PM Douglas Bates wrote: > >> If you are concerned about singularity it would be better to use a QR or >> singular value decomposition of X because the condition number of X'X is >> the square of the condition number of X. Just the act of forming X'X from X >> makes the conditioning much worse. >> >> I believe that arma::solve, as used by >> https://github.com/RcppCore/RcppArmadillo/blob/master/src/fastLm.cpp, >> uses a QR decomposition. >> >> In general it is much more stable to work with a decomposition of X than >> to form X'X and work with it. >> >> Also, even if the formula for a calculation involves the inverse of a >> matrix, it is rarely a good idea to evaluate the full inverse. Most of the >> time, evaluating the inverse is much more work than is required to solve a >> single linear system of equations. It is like saying that evaluating a >> scalar quotient like a/b requires that you first evaluate the inverse of b, >> 1/b, then form the product a * (1/b). In fact, it is worse because >> evaluating the inverse of an n by n matrix A takes roughly the same amount >> of work as solving n systems of equations of the form Ax = b. >> >> On Fri, Jan 28, 2022 at 6:56 PM wrote: >> >>> Thanks! I'll use if(rcf <= std::numeric_limits::epsilon()) >>> instead of rcf==0. >>> >>> On Fri, Jan 28, 2022 at 7:32 PM Dirk Eddelbuettel >>> wrote: >>> On 28 January 2022 at 13:25, Alex Ilich wrote: | Thank you Neal and Dirk. Based on these comments I think using arma::rcond Ah, yes, of course. Should have remembered arma::rcond. | to get the reciprocal of the condition factor, and checking if that is zero | rather than the determinant may be the route to go. At least for this | example that works. | | library(Rcpp) | | cppFunction(depends="RcppArmadillo", "NumericVector C_OLS(arma::mat X, | arma::mat Y){ | arma::mat Xt = trans(X); | arma::mat XtX = Xt * X; | double rcf = rcond(XtX); | Rcout << rcf; //print reciprocal of condition factor | if(rcf==0){ Re-read Neal's first email. You probably do not want zero (apart from all the R FAQ 7.31 reasons to not compare a double with equality :wink: ) Dirk | NumericVector B2(X.n_cols, NA_REAL); | return B2; | } else{ | arma::mat XtX_inv= inv(XtX); | NumericVector B = Rcpp::as(wrap(XtX_inv * (Xt * | Y))); | return B; | }}") | | X<- matrix(data=c(-250, -250, -250, -250, -250, -250, 250, 200, 150, | 100, 50, 0, 1, 1, 1, 1, 1, 1), ncol=3, nrow=6) | | | X2<- X | | X2[1,2]<- X[1,2] + 5.684342e-14 #Add a slight numeric precision issue | | Y<- matrix(1:6, ncol=1) | | C_OLS(X,Y) | # 0[1] NA NA NA | | C_OLS(X2,Y) | # 0[1] NA NA NA | | | | On Fri, Jan 28, 2022 at 11:53 AM Alex Ilich wrote: | | > Hi, I have some code I've written to calculate regression parameters using | > ordinarily least squares using RcppArmadillo. This is meant to be performed | > iteratively over many small subsets (sliding window over spatial raster | > data) so I'd like to automatically detect when the regression can't be | > performed and just return a vector of NA's. Currently I check to see if the | > determinant is 0 in order to see if the XtX matrix can be inverted and it | > seems to work in the vast majority of cases but I did run into a case where | > a small amount of precision error was introduced and for some reason this | > causes the determinant to be non-zero in C++ however, the matrix still is | > considered singular (uninvertible) leading to an error. I was wondering if | > anyone had any suggestions? Maybe there's a minimum value non-zero value | > where it's considered essentially singular that I could use instead of 0, | > or maybe there's a way to return a vector of NAs if an error is detected | > rather than stopping execution and print
Re: [Rcpp-devel] Detecting Singular Matrices in RcppArmadillo
On 31 January 2022 at 09:13, Zé Vinícius wrote: | Yes, quoting the paper on which ‘solve’ is based on ( | http://arma.sourceforge.net/armadillo_solver_2020.pdf): | | “The SVD-based solver uses the xGELSD set of functions, which find a | minimum-norm solution to a linear least squares problem.” Yes. A longer story though on how this is not what may matter. I should start by saying that I never found really decent documentation describing this 'split' in the world view as this problem is, in essence, seen differently by numerical analysis specialists (as for example the LAPACK authors of the *GELSD functions) and the statistical users which differs in what they focus on. Which is why R uses a modified version of LINPACK (as I recall going back to by Ross Ihaka) when computing lm(). And not the LAPACK routines. Doug was always adamant about this when I wrote the different simple fastLm() approaches (in RcppGSL, RcppArmadillo, ...) which do _not_ properly account for rank-deficiency (as R's lm() would) -- Doug also wrote the nicest fastLm example in RcppEigen. There is a little bit more in the help pages for the various lmFast() versions as well as an explicit example (also due to Doug). Now, I should add that whenever I tried to construct an example on a more real-world-alike regression problem, I could not come anywhere close to actually seeing the rank deficiency. But it is unmistakenly there is the appropriately created case. So buyer beware. Dirk -- https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel
Re: [Rcpp-devel] Detecting Singular Matrices in RcppArmadillo
Very interesting, Dirk. Thanks! On Mon, Jan 31, 2022 at 12:18 PM Dirk Eddelbuettel wrote: > > On 31 January 2022 at 09:13, Zé Vinícius wrote: > | Yes, quoting the paper on which ‘solve’ is based on ( > | http://arma.sourceforge.net/armadillo_solver_2020.pdf): > | > | “The SVD-based solver uses the xGELSD set of functions, which find a > | minimum-norm solution to a linear least squares problem.” > > Yes. A longer story though on how this is not what may matter. I should > start > by saying that I never found really decent documentation describing this > 'split' in the world view as this problem is, in essence, seen differently > by > numerical analysis specialists (as for example the LAPACK authors of the > *GELSD functions) and the statistical users which differs in what they > focus > on. Which is why R uses a modified version of LINPACK (as I recall going > back > to by Ross Ihaka) when computing lm(). And not the LAPACK routines. > > Doug was always adamant about this when I wrote the different simple > fastLm() > approaches (in RcppGSL, RcppArmadillo, ...) which do _not_ properly account > for rank-deficiency (as R's lm() would) -- Doug also wrote the nicest > fastLm > example in RcppEigen. > > There is a little bit more in the help pages for the various lmFast() > versions as well as an explicit example (also due to Doug). > > Now, I should add that whenever I tried to construct an example on a more > real-world-alike regression problem, I could not come anywhere close to > actually seeing the rank deficiency. But it is unmistakenly there is the > appropriately created case. So buyer beware. > > Dirk > > -- > https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > -- Zé Vinícius https://mirca.github.io ___ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel