Re: [Rcpp-devel] Detecting Singular Matrices in RcppArmadillo

2022-01-30 Thread Alex Ilich
 Thanks for the advice Douglas. Just using B = solve(X,Y) works and it
outputs a warning that an approximate solution is attempted when the matrix
is nearly singular. To suppress the warning I've just included #define
ARMA_WARN_LEVEL 1 at the top of my cpp code. Also, no longer getting some
odd outliers that I was previously! For documentation purposes, would it
still be accurate to say "parameters are fit using ordinary least squares"?

Thanks,
Alex

On Sat, Jan 29, 2022 at 12:05 PM Douglas Bates  wrote:

> If you are concerned about singularity it would be better to use a QR or
> singular value decomposition of X because the condition number of X'X is
> the square of the condition number of X. Just the act of forming X'X from X
> makes the conditioning much worse.
>
> I believe that arma::solve, as used by
> https://github.com/RcppCore/RcppArmadillo/blob/master/src/fastLm.cpp,
> uses a QR decomposition.
>
> In general it is much more stable to work with a decomposition of X than
> to form X'X and work with it.
>
>  Also, even if the formula for a calculation involves the inverse of a
> matrix, it is rarely a good idea to evaluate the full inverse.  Most of the
> time, evaluating the inverse is much more work than is required to solve a
> single linear system of equations.  It is like saying that evaluating a
> scalar quotient like a/b requires that you first evaluate the inverse of b,
> 1/b, then form the product a * (1/b).  In fact, it is worse because
> evaluating the inverse of an n by n matrix A takes roughly the same amount
> of work as solving n systems of equations of the form Ax = b.
>
> On Fri, Jan 28, 2022 at 6:56 PM  wrote:
>
>> Thanks! I'll use if(rcf <= std::numeric_limits::epsilon())
>> instead of rcf==0.
>>
>> On Fri, Jan 28, 2022 at 7:32 PM Dirk Eddelbuettel  wrote:
>>
>>>
>>> On 28 January 2022 at 13:25, Alex Ilich wrote:
>>> | Thank you Neal and Dirk. Based on these comments I think using
>>> arma::rcond
>>>
>>> Ah, yes, of course. Should have remembered arma::rcond.
>>>
>>> | to get the reciprocal of the condition factor, and checking if that is
>>> zero
>>> | rather than the determinant may be the route to go. At least for this
>>> | example that works.
>>> |
>>> | library(Rcpp)
>>> |
>>> | cppFunction(depends="RcppArmadillo", "NumericVector C_OLS(arma::mat X,
>>> | arma::mat Y){
>>> |   arma::mat Xt = trans(X);
>>> |   arma::mat XtX = Xt * X;
>>> |   double rcf = rcond(XtX);
>>> |   Rcout << rcf; //print reciprocal of condition factor
>>> |   if(rcf==0){
>>>
>>> Re-read Neal's first email.  You probably do not want zero (apart from
>>> all
>>> the R FAQ 7.31 reasons to not compare a double with equality :wink: )
>>>
>>> Dirk
>>>
>>> | NumericVector B2(X.n_cols, NA_REAL);
>>> | return B2;
>>> |   } else{
>>> | arma::mat XtX_inv= inv(XtX);
>>> | NumericVector B = Rcpp::as(wrap(XtX_inv * (Xt
>>> *
>>> | Y)));
>>> | return B;
>>> |   }}")
>>> |
>>> | X<- matrix(data=c(-250, -250, -250, -250, -250, -250, 250, 200, 150,
>>> |   100, 50, 0, 1, 1, 1, 1, 1, 1), ncol=3, nrow=6)
>>> |
>>> |
>>> | X2<- X
>>> |
>>> | X2[1,2]<- X[1,2] + 5.684342e-14 #Add a slight numeric precision issue
>>> |
>>> | Y<- matrix(1:6, ncol=1)
>>> |
>>> | C_OLS(X,Y)
>>> | # 0[1] NA NA NA
>>> |
>>> | C_OLS(X2,Y)
>>> | # 0[1] NA NA NA
>>> |
>>> |
>>> |
>>> | On Fri, Jan 28, 2022 at 11:53 AM Alex Ilich 
>>> wrote:
>>> |
>>> | > Hi, I have some code I've written to calculate regression parameters
>>> using
>>> | > ordinarily least squares using RcppArmadillo. This is meant to be
>>> performed
>>> | > iteratively over many small subsets (sliding window over spatial
>>> raster
>>> | > data) so I'd like to automatically detect when the regression can't
>>> be
>>> | > performed and just return a vector of NA's. Currently I check to see
>>> if the
>>> | > determinant is 0 in order to see if the XtX matrix can be inverted
>>> and it
>>> | > seems to work in the vast majority of cases but I did run into a
>>> case where
>>> | > a small amount of precision error was introduced and for some reason
>>> this
>>> | > causes the determinant to be non-zero in C++ however, the matrix
>>> still is
>>> | > considered singular (uninvertible) leading to an error. I was
>>> wondering if
>>> | > anyone had any suggestions? Maybe there's a minimum value non-zero
>>> value
>>> | > where it's considered essentially singular that I could use instead
>>> of 0,
>>> | > or maybe there's a way to return a vector of NAs if an error is
>>> detected
>>> | > rather than stopping execution and printing an error message. I've
>>> also
>>> | > seen that pinv can be used, but from some tests I ran it's
>>> substantially
>>> | > slower, so I'd like to avoid using that. I've included some example
>>> code
>>> | > below. Any suggestions would be greatly appreciated.
>>> | >
>>> | > Thanks,
>>> | > Alex
>>> | >
>>> | > library(Rcpp)
>>> | > cppFunction(depends="RcppArmadillo", "NumericVector C_OLS(arma::mat
>>> X,
>>> | > arma:

Re: [Rcpp-devel] Detecting Singular Matrices in RcppArmadillo

2022-01-30 Thread Zé Vinícius
Yes, quoting the paper on which ‘solve’ is based on (
http://arma.sourceforge.net/armadillo_solver_2020.pdf):

“The SVD-based solver uses the xGELSD set of functions, which find a
minimum-norm solution to a linear least squares problem.”

On Mon, Jan 31, 2022 at 12:12 AM Alex Ilich  wrote:

> Thanks for the advice Douglas. Just using B = solve(X,Y) works and it
> outputs a warning that an approximate solution is attempted when the matrix
> is nearly singular. To suppress the warning I've just included #define
> ARMA_WARN_LEVEL 1 at the top of my cpp code. Also, no longer getting some
> odd outliers that I was previously! For documentation purposes, would it
> still be accurate to say "parameters are fit using ordinary least squares"?
>
> Thanks,
> Alex
>
> On Sat, Jan 29, 2022 at 12:05 PM Douglas Bates  wrote:
>
>> If you are concerned about singularity it would be better to use a QR or
>> singular value decomposition of X because the condition number of X'X is
>> the square of the condition number of X. Just the act of forming X'X from X
>> makes the conditioning much worse.
>>
>> I believe that arma::solve, as used by
>> https://github.com/RcppCore/RcppArmadillo/blob/master/src/fastLm.cpp,
>> uses a QR decomposition.
>>
>> In general it is much more stable to work with a decomposition of X than
>> to form X'X and work with it.
>>
>>  Also, even if the formula for a calculation involves the inverse of a
>> matrix, it is rarely a good idea to evaluate the full inverse.  Most of the
>> time, evaluating the inverse is much more work than is required to solve a
>> single linear system of equations.  It is like saying that evaluating a
>> scalar quotient like a/b requires that you first evaluate the inverse of b,
>> 1/b, then form the product a * (1/b).  In fact, it is worse because
>> evaluating the inverse of an n by n matrix A takes roughly the same amount
>> of work as solving n systems of equations of the form Ax = b.
>>
>> On Fri, Jan 28, 2022 at 6:56 PM  wrote:
>>
>>> Thanks! I'll use if(rcf <= std::numeric_limits::epsilon())
>>> instead of rcf==0.
>>>
>>> On Fri, Jan 28, 2022 at 7:32 PM Dirk Eddelbuettel 
>>> wrote:
>>>

 On 28 January 2022 at 13:25, Alex Ilich wrote:
 | Thank you Neal and Dirk. Based on these comments I think using
 arma::rcond

 Ah, yes, of course. Should have remembered arma::rcond.

 | to get the reciprocal of the condition factor, and checking if that
 is zero
 | rather than the determinant may be the route to go. At least for this
 | example that works.
 |
 | library(Rcpp)
 |
 | cppFunction(depends="RcppArmadillo", "NumericVector C_OLS(arma::mat X,
 | arma::mat Y){
 |   arma::mat Xt = trans(X);
 |   arma::mat XtX = Xt * X;
 |   double rcf = rcond(XtX);
 |   Rcout << rcf; //print reciprocal of condition factor
 |   if(rcf==0){

 Re-read Neal's first email.  You probably do not want zero (apart from
 all
 the R FAQ 7.31 reasons to not compare a double with equality :wink: )

 Dirk

 | NumericVector B2(X.n_cols, NA_REAL);
 | return B2;
 |   } else{
 | arma::mat XtX_inv= inv(XtX);
 | NumericVector B = Rcpp::as(wrap(XtX_inv *
 (Xt *
 | Y)));
 | return B;
 |   }}")
 |
 | X<- matrix(data=c(-250, -250, -250, -250, -250, -250, 250, 200, 150,
 |   100, 50, 0, 1, 1, 1, 1, 1, 1), ncol=3, nrow=6)
 |
 |
 | X2<- X
 |
 | X2[1,2]<- X[1,2] + 5.684342e-14 #Add a slight numeric precision issue
 |
 | Y<- matrix(1:6, ncol=1)
 |
 | C_OLS(X,Y)
 | # 0[1] NA NA NA
 |
 | C_OLS(X2,Y)
 | # 0[1] NA NA NA
 |
 |
 |
 | On Fri, Jan 28, 2022 at 11:53 AM Alex Ilich 
 wrote:
 |
 | > Hi, I have some code I've written to calculate regression
 parameters using
 | > ordinarily least squares using RcppArmadillo. This is meant to be
 performed
 | > iteratively over many small subsets (sliding window over spatial
 raster
 | > data) so I'd like to automatically detect when the regression can't
 be
 | > performed and just return a vector of NA's. Currently I check to
 see if the
 | > determinant is 0 in order to see if the XtX matrix can be inverted
 and it
 | > seems to work in the vast majority of cases but I did run into a
 case where
 | > a small amount of precision error was introduced and for some
 reason this
 | > causes the determinant to be non-zero in C++ however, the matrix
 still is
 | > considered singular (uninvertible) leading to an error. I was
 wondering if
 | > anyone had any suggestions? Maybe there's a minimum value non-zero
 value
 | > where it's considered essentially singular that I could use instead
 of 0,
 | > or maybe there's a way to return a vector of NAs if an error is
 detected
 | > rather than stopping execution and print

Re: [Rcpp-devel] Detecting Singular Matrices in RcppArmadillo

2022-01-30 Thread Dirk Eddelbuettel

On 31 January 2022 at 09:13, Zé Vinícius wrote:
| Yes, quoting the paper on which ‘solve’ is based on (
| http://arma.sourceforge.net/armadillo_solver_2020.pdf):
| 
| “The SVD-based solver uses the xGELSD set of functions, which find a
| minimum-norm solution to a linear least squares problem.”

Yes. A longer story though on how this is not what may matter. I should start
by saying that I never found really decent documentation describing this
'split' in the world view as this problem is, in essence, seen differently by
numerical analysis specialists (as for example the LAPACK authors of the
*GELSD functions) and the statistical users which differs in what they focus
on. Which is why R uses a modified version of LINPACK (as I recall going back
to by Ross Ihaka) when computing lm(). And not the LAPACK routines.

Doug was always adamant about this when I wrote the different simple fastLm()
approaches (in RcppGSL, RcppArmadillo, ...) which do _not_ properly account
for rank-deficiency (as R's lm() would) -- Doug also wrote the nicest fastLm
example in RcppEigen.

There is a little bit more in the help pages for the various lmFast()
versions as well as an explicit example (also due to Doug).

Now, I should add that whenever I tried to construct an example on a more
real-world-alike regression problem, I could not come anywhere close to
actually seeing the rank deficiency. But it is unmistakenly there is the
appropriately created case. So buyer beware.

Dirk

-- 
https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel

Re: [Rcpp-devel] Detecting Singular Matrices in RcppArmadillo

2022-01-30 Thread Zé Vinícius
Very interesting, Dirk. Thanks!

On Mon, Jan 31, 2022 at 12:18 PM Dirk Eddelbuettel  wrote:

>
> On 31 January 2022 at 09:13, Zé Vinícius wrote:
> | Yes, quoting the paper on which ‘solve’ is based on (
> | http://arma.sourceforge.net/armadillo_solver_2020.pdf):
> |
> | “The SVD-based solver uses the xGELSD set of functions, which find a
> | minimum-norm solution to a linear least squares problem.”
>
> Yes. A longer story though on how this is not what may matter. I should
> start
> by saying that I never found really decent documentation describing this
> 'split' in the world view as this problem is, in essence, seen differently
> by
> numerical analysis specialists (as for example the LAPACK authors of the
> *GELSD functions) and the statistical users which differs in what they
> focus
> on. Which is why R uses a modified version of LINPACK (as I recall going
> back
> to by Ross Ihaka) when computing lm(). And not the LAPACK routines.
>
> Doug was always adamant about this when I wrote the different simple
> fastLm()
> approaches (in RcppGSL, RcppArmadillo, ...) which do _not_ properly account
> for rank-deficiency (as R's lm() would) -- Doug also wrote the nicest
> fastLm
> example in RcppEigen.
>
> There is a little bit more in the help pages for the various lmFast()
> versions as well as an explicit example (also due to Doug).
>
> Now, I should add that whenever I tried to construct an example on a more
> real-world-alike regression problem, I could not come anywhere close to
> actually seeing the rank deficiency. But it is unmistakenly there is the
> appropriately created case. So buyer beware.
>
> Dirk
>
> --
> https://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org
>
-- 
Zé Vinícius
https://mirca.github.io
___
Rcpp-devel mailing list
Rcpp-devel@lists.r-forge.r-project.org
https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel