Re: [R] very fast OLS regression?

2009-03-28 Thread Douglas Bates
On Wed, Mar 25, 2009 at 5:15 PM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote:
 On Wed, 2009-03-25 at 16:28 -0400, ivo welch wrote:
 Dear R experts:

 I just tried some simple test that told me that hand computing the OLS
 coefficients is about 3-10 times as fast as using the built-in lm()
 function.   (code included below.)  Most of the time, I do not care,
 because I like the convenience, and I presume some of the time goes
 into saving a lot of stuff that I may or may not need.  But when I do
 want to learn the properties of an estimator whose input contains a
 regression, I do care about speed.

 What is the recommended fastest way to get regression coefficients in
 R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
 low-level C form somewhere?  that one was always lightning fast for
 me.)

 No one has yet mentioned Doug Bates' article in R News on this topic,
 which compares timings of various methods for least squares
 computations.

 Douglas Bates. Least squares calculations in R. R News, 4(1):17-20, June
 2004.

 A Cholesky decomposition solution proved fastest in base R code, with an
 even faster version developed using sparse matrices and the Matrix
 package.

 You can find Doug's article here:

 http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf

An updated version is available as the Comparisons vignette in the
Matrix package.


 regards,

 /ivo



 bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));

 byhand = function( y, x ) {
   xy-t(x)%*%y;
   xxi- solve(t(x)%*%x)
   b-as.vector(xxi%*%xy)
   ## I will need these later, too:
   ## res-y-as.vector(x%*%b)
   ## soa[i]-b[2]
   ## sigmas[i]-sd(res)
   b;
 }


 MC=500;
 N=1;


 set.seed(0);
 x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
 y= matrix( rnorm(N*MC), nrow=N, ncol=MC );

 ptm = proc.time()
 for (mc in 1:MC) byhand(y[,mc],x[,mc]);
 cat(By hand took , proc.time()-ptm, \n);

 ptm = proc.time()
 for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
 cat(By built-in took , proc.time()-ptm, \n);

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 --
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
  Dr. Gavin Simpson             [t] +44 (0)20 7679 0522
  ECRC, UCL Geography,          [f] +44 (0)20 7679 0565
  Pearson Building,             [e] gavin.simpsonATNOSPAMucl.ac.uk
  Gower Street, London          [w] http://www.ucl.ac.uk/~ucfagls/
  UK. WC1E 6BT.                 [w] http://www.freshwaters.org.uk
 %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-28 Thread ivowel
thanks everybody. I also just read Dirk E's high-performance computing  
tutorial. now I wonder: would it be faster to compile a C version of  
Gentleman's algorithm for WLS into R? before I waste a few days trying to  
program this in and getting it all to work together, would the end result  
likely be a good speedup, or more likely be a waste of my time? any hunches?

regards, /iaw

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-26 Thread Thomas Lumley

On Wed, 25 Mar 2009, ivo welch wrote:


thanks, dimitris.  I also added Bill Dunlap's solve(qr(x),y)
function as ols5.   here is what I get in terms of speed on a Mac Pro:

ols1 6.779 3.591 10.37 0 0
ols2 0.515 0.21 0.725 0 0
ols3 0.576 0.403 0.971 0 0
ols4 1.143 1.251 2.395 0 0
ols5 0.683 0.565 1.248 0 0

so the naive matrix operations are fastest.  I would have thought that
alternatives to the naive stuff I learned in my linear algebra course
would be quicker.   still, ols3 and ols5 are competitive.  the
built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
preferable in terms of accuracy?  I think I can deal with 20% speed
slow-down (but not with a factor 10 speed slow-down).


ols5 is more accurate in terms of round-off error, and so it is how the 
internal computations are done in lm and lsfit, but it is relatively unusual 
for this to matter given double precision.

If you want to repeat the regression with the same x and many y you can do much 
better by taking the QR decomposition just once and applying to a matrix of all 
the ys.

It's also possible that using chol() and backsolve() rather than solve() would 
speed up ols2 and ols3, at least for large enough matrices.

 -thomas




regards,

/iaw


On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
d.rizopou...@erasmusmc.nl wrote:

check the following options:

ols1 - function (y, x) {
   coef(lm(y ~ x - 1))
}

ols2 - function (y, x) {
   xy - t(x)%*%y
   xxi - solve(t(x)%*%x)
   b - as.vector(xxi%*%xy)
   b
}

ols3 - function (y, x) {
   XtX - crossprod(x)
   Xty - crossprod(x, y)
   solve(XtX, Xty)
}

ols4 - function (y, x) {
   lm.fit(x, y)$coefficients
}



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-26 Thread Thomas Lumley

On Wed, 25 Mar 2009, Ravi Varadhan wrote:

Yes, Bert.  Any least-squares solution that forms X'X and then inverts it is 
not to be recommended.  If X is nearly rank-deficient, then X'X will be more 
strongly so.  The QR decomposition approach in my byhand.qr() function is 
reliable and fast.


Forming the matrix of crossproducts and using cholesky decomposition is faster, 
so it does depend on the intended use.

In a simulation, the OP's situation, you may well know that X is not nearly 
rank deficient, in which case the speed advantage may be worthwhile.  After 
all, even if the condition number of X is 10^5 you will still have five or six 
accurate digits in the result.

If you are writing code that will be used in ways you have no control over, 
then of course it makes sense to use the more stable QR decomposition.

 -thomas

Thomas Lumley   Assoc. Professor, Biostatistics
tlum...@u.washington.eduUniversity of Washington, Seattle

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-26 Thread Bernardo Rangel Tura
On Wed, 2009-03-25 at 22:11 +0100, Dimitris Rizopoulos wrote:
 check the following options:
 
 ols1 - function (y, x) {
  coef(lm(y ~ x - 1))
 }
 
 ols2 - function (y, x) {
  xy - t(x)%*%y
  xxi - solve(t(x)%*%x)
  b - as.vector(xxi%*%xy)
  b
 }
 
 ols3 - function (y, x) {
  XtX - crossprod(x)
  Xty - crossprod(x, y)
  solve(XtX, Xty)
 }
 
 ols4 - function (y, x) {
  lm.fit(x, y)$coefficients
 }
 
 # check timings
 MC - 500
 N - 1
 
 set.seed(0)
 x - matrix(rnorm(N*MC), N, MC)
 y - matrix(rnorm(N*MC), N, MC)
 
 invisible({gc(); gc(); gc()})
 system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc]))
 
 invisible({gc(); gc(); gc()})
 system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc]))
 
 invisible({gc(); gc(); gc()})
 system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc]))
 
 invisible({gc(); gc(); gc()})
 system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE]))

Hi Dimitris and Ivo

I think this is not a fair comparison, look this

x[8,100]-NA

system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc]))
   user  system elapsed 
  8.765   0.000   8.762 
 

 system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc]))
Error in solve.default(t(x) %*% x) : 
  system is computationally singular: reciprocal condition number = 0
Timing stopped at: 0 0 0.002 
 

 system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc]))
Error in solve.default(XtX, Xty) : 
  system is computationally singular: reciprocal condition number = 0
Timing stopped at: 0 0 0.001 
 

 system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE]))
Error in lm.fit(x, y) : NA/NaN/Inf in foreign function call (arg 1)
Timing stopped at: 0 0 0.001 

So routines ols2, ols3 and ols4 only functional in fully matrix if have
one NA this functions don't run.

-- 
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] very fast OLS regression?

2009-03-25 Thread ivo welch
Dear R experts:

I just tried some simple test that told me that hand computing the OLS
coefficients is about 3-10 times as fast as using the built-in lm()
function.   (code included below.)  Most of the time, I do not care,
because I like the convenience, and I presume some of the time goes
into saving a lot of stuff that I may or may not need.  But when I do
want to learn the properties of an estimator whose input contains a
regression, I do care about speed.

What is the recommended fastest way to get regression coefficients in
R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
low-level C form somewhere?  that one was always lightning fast for
me.)

regards,

/ivo



bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));

byhand = function( y, x ) {
  xy-t(x)%*%y;
  xxi- solve(t(x)%*%x)
  b-as.vector(xxi%*%xy)
  ## I will need these later, too:
  ## res-y-as.vector(x%*%b)
  ## soa[i]-b[2]
  ## sigmas[i]-sd(res)
  b;
}


MC=500;
N=1;


set.seed(0);
x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
y= matrix( rnorm(N*MC), nrow=N, ncol=MC );

ptm = proc.time()
for (mc in 1:MC) byhand(y[,mc],x[,mc]);
cat(By hand took , proc.time()-ptm, \n);

ptm = proc.time()
for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
cat(By built-in took , proc.time()-ptm, \n);

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-25 Thread Dimitris Rizopoulos

check the following options:

ols1 - function (y, x) {
coef(lm(y ~ x - 1))
}

ols2 - function (y, x) {
xy - t(x)%*%y
xxi - solve(t(x)%*%x)
b - as.vector(xxi%*%xy)
b
}

ols3 - function (y, x) {
XtX - crossprod(x)
Xty - crossprod(x, y)
solve(XtX, Xty)
}

ols4 - function (y, x) {
lm.fit(x, y)$coefficients
}

# check timings
MC - 500
N - 1

set.seed(0)
x - matrix(rnorm(N*MC), N, MC)
y - matrix(rnorm(N*MC), N, MC)

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc]))

invisible({gc(); gc(); gc()})
system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE]))


I hope it helps.

Best,
Dimitris


ivo welch wrote:

Dear R experts:

I just tried some simple test that told me that hand computing the OLS
coefficients is about 3-10 times as fast as using the built-in lm()
function.   (code included below.)  Most of the time, I do not care,
because I like the convenience, and I presume some of the time goes
into saving a lot of stuff that I may or may not need.  But when I do
want to learn the properties of an estimator whose input contains a
regression, I do care about speed.

What is the recommended fastest way to get regression coefficients in
R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
low-level C form somewhere?  that one was always lightning fast for
me.)

regards,

/ivo



bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));

byhand = function( y, x ) {
  xy-t(x)%*%y;
  xxi- solve(t(x)%*%x)
  b-as.vector(xxi%*%xy)
  ## I will need these later, too:
  ## res-y-as.vector(x%*%b)
  ## soa[i]-b[2]
  ## sigmas[i]-sd(res)
  b;
}


MC=500;
N=1;


set.seed(0);
x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
y= matrix( rnorm(N*MC), nrow=N, ncol=MC );

ptm = proc.time()
for (mc in 1:MC) byhand(y[,mc],x[,mc]);
cat(By hand took , proc.time()-ptm, \n);

ptm = proc.time()
for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
cat(By built-in took , proc.time()-ptm, \n);

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Dimitris Rizopoulos
Assistant Professor
Department of Biostatistics
Erasmus University Medical Center

Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands
Tel: +31/(0)10/7043478
Fax: +31/(0)10/7043014

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-25 Thread ivo welch
thanks, dimitris.  I also added Bill Dunlap's solve(qr(x),y)
function as ols5.   here is what I get in terms of speed on a Mac Pro:

ols1 6.779 3.591 10.37 0 0
ols2 0.515 0.21 0.725 0 0
ols3 0.576 0.403 0.971 0 0
ols4 1.143 1.251 2.395 0 0
ols5 0.683 0.565 1.248 0 0

so the naive matrix operations are fastest.  I would have thought that
alternatives to the naive stuff I learned in my linear algebra course
would be quicker.   still, ols3 and ols5 are competitive.  the
built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
preferable in terms of accuracy?  I think I can deal with 20% speed
slow-down (but not with a factor 10 speed slow-down).

regards,

/iaw


On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
d.rizopou...@erasmusmc.nl wrote:
 check the following options:

 ols1 - function (y, x) {
    coef(lm(y ~ x - 1))
 }

 ols2 - function (y, x) {
    xy - t(x)%*%y
    xxi - solve(t(x)%*%x)
    b - as.vector(xxi%*%xy)
    b
 }

 ols3 - function (y, x) {
    XtX - crossprod(x)
    Xty - crossprod(x, y)
    solve(XtX, Xty)
 }

 ols4 - function (y, x) {
    lm.fit(x, y)$coefficients
 }


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-25 Thread Ravi Varadhan
Ivo,

I would be careful about dealing with situations where x may be numerically 
ill-conditioned.  Your code will do quite badly in such cases.  Here is an 
extreme illustration:

set.seed(123)
x - matrix(rnorm(20), 10, 2)
x - cbind(x, x[,1])   # x is clearly rank-deficient
y - c(x %*% c(1,2,3))

 byhand(y, x)
Error in solve.default(t(x) %*% x) : 
  Lapack routine dgesv: system is exactly singular


A better approach would be to explicitly use QR decomposition in your 
byhand() function:

byhand.qr = function( y, x, tol=1.e-07 ) {
qr.x - qr(x, tol=tol, LAPACK=TRUE)
b-qr.coef(qr.x, y)
## I will need these later, too:
## res- qr.res(qr.x, y)
## soa[i]-b[2]
## sigmas[i]-sd(res)
b;
}

 ans - byhand.qr(y, x)
 ans
[1] 1.795725 2.00 2.204275


You can verify that this a valid solution:

 y - c(x %*% ans)
 [1]  9.436896e-16  2.498002e-16 -8.881784e-16  0.00e+00 -4.440892e-16  
1.776357e-15  4.440892e-16  0.00e+00  4.440892e-16 -4.440892e-16


However, this is not the unique solution, since there an infinite number of 
solutions.  We can get the solution that has the minimum length by using 
Moore-Penrose inverse:

 require(MASS)
 ans.min - c(ginv(x) %*% y)
 ans.min
[1] 2 2 2


You can verify that ans.min has a smaller L2-norm than ans, and it is unique.

Hope this helps,
Ravi.



Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: ivo welch ivo...@gmail.com
Date: Wednesday, March 25, 2009 4:31 pm
Subject: [R] very fast OLS regression?
To: r-help r-h...@stat.math.ethz.ch


 Dear R experts:
  
  I just tried some simple test that told me that hand computing the OLS
  coefficients is about 3-10 times as fast as using the built-in lm()
  function.   (code included below.)  Most of the time, I do not care,
  because I like the convenience, and I presume some of the time goes
  into saving a lot of stuff that I may or may not need.  But when I do
  want to learn the properties of an estimator whose input contains a
  regression, I do care about speed.
  
  What is the recommended fastest way to get regression coefficients in
  R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
  low-level C form somewhere?  that one was always lightning fast for
  me.)
  
  regards,
  
  /ivo
  
  
  
  bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));
  
  byhand = function( y, x ) {
xy-t(x)%*%y;
xxi- solve(t(x)%*%x)
b-as.vector(xxi%*%xy)
## I will need these later, too:
## res-y-as.vector(x%*%b)
## soa[i]-b[2]
## sigmas[i]-sd(res)
b;
  }
  
  
  MC=500;
  N=1;
  
  
  set.seed(0);
  x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
  y= matrix( rnorm(N*MC), nrow=N, ncol=MC );
  
  ptm = proc.time()
  for (mc in 1:MC) byhand(y[,mc],x[,mc]);
  cat(By hand took , proc.time()-ptm, \n);
  
  ptm = proc.time()
  for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
  cat(By built-in took , proc.time()-ptm, \n);
  
  __
  R-help@r-project.org mailing list
  
  PLEASE do read the posting guide 
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-25 Thread Bert Gunter
lm is slow because it has to set up the design matrix (X) each time. See
?model.matrix and ?model.matrix.lm  for how to do this once separately from
lm (and then use lm.fit).

I am far from an expert on numerical algebra, but I'm pretty sure your
comments below are incorrect in the sense that naive methods are **not**
universally better then efficient numerical algebra methods using,say, QR
decompositions. It will depend very strongly on the size and specific nature
of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) are,
after all, asymptotic. There is also typically a tradeoff between
computational accuracy (especially for ill-conditioned matrices)  and speed
which your remarks seem to neglect.

-- Bert


Bert Gunter
Genentech Nonclinical Biostatistics
650-467-7374

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of ivo welch
Sent: Wednesday, March 25, 2009 2:30 PM
To: Dimitris Rizopoulos
Cc: r-help
Subject: Re: [R] very fast OLS regression?

thanks, dimitris.  I also added Bill Dunlap's solve(qr(x),y)
function as ols5.   here is what I get in terms of speed on a Mac Pro:

ols1 6.779 3.591 10.37 0 0
ols2 0.515 0.21 0.725 0 0
ols3 0.576 0.403 0.971 0 0
ols4 1.143 1.251 2.395 0 0
ols5 0.683 0.565 1.248 0 0

so the naive matrix operations are fastest.  I would have thought that
alternatives to the naive stuff I learned in my linear algebra course
would be quicker.   still, ols3 and ols5 are competitive.  the
built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
preferable in terms of accuracy?  I think I can deal with 20% speed
slow-down (but not with a factor 10 speed slow-down).

regards,

/iaw


On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
d.rizopou...@erasmusmc.nl wrote:
 check the following options:

 ols1 - function (y, x) {
    coef(lm(y ~ x - 1))
 }

 ols2 - function (y, x) {
    xy - t(x)%*%y
    xxi - solve(t(x)%*%x)
    b - as.vector(xxi%*%xy)
    b
 }

 ols3 - function (y, x) {
    XtX - crossprod(x)
    Xty - crossprod(x, y)
    solve(XtX, Xty)
 }

 ols4 - function (y, x) {
    lm.fit(x, y)$coefficients
 }


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-25 Thread Ravi Varadhan
Yes, Bert.  Any least-squares solution that forms X'X and then inverts it is 
not to be recommended.  If X is nearly rank-deficient, then X'X will be more 
strongly so.  The QR decomposition approach in my byhand.qr() function is 
reliable and fast.

Ravi.


Ravi Varadhan, Ph.D.
Assistant Professor,
Division of Geriatric Medicine and Gerontology
School of Medicine
Johns Hopkins University

Ph. (410) 502-2619
email: rvarad...@jhmi.edu


- Original Message -
From: Bert Gunter gunter.ber...@gene.com
Date: Wednesday, March 25, 2009 6:03 pm
Subject: Re: [R] very fast OLS regression?
To: 'ivo welch' ivo...@gmail.com, 'Dimitris Rizopoulos' 
d.rizopou...@erasmusmc.nl
Cc: 'r-help' r-h...@stat.math.ethz.ch


 lm is slow because it has to set up the design matrix (X) each time. See
  ?model.matrix and ?model.matrix.lm  for how to do this once 
 separately from
  lm (and then use lm.fit).
  
  I am far from an expert on numerical algebra, but I'm pretty sure your
  comments below are incorrect in the sense that naive methods are **not**
  universally better then efficient numerical algebra methods 
 using,say, QR
  decompositions. It will depend very strongly on the size and specific 
 nature
  of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) 
 are,
  after all, asymptotic. There is also typically a tradeoff between
  computational accuracy (especially for ill-conditioned matrices)  and 
 speed
  which your remarks seem to neglect.
  
  -- Bert
  
  
  Bert Gunter
  Genentech Nonclinical Biostatistics
  650-467-7374
  
  -Original Message-
  From: r-help-boun...@r-project.org [ On
  Behalf Of ivo welch
  Sent: Wednesday, March 25, 2009 2:30 PM
  To: Dimitris Rizopoulos
  Cc: r-help
  Subject: Re: [R] very fast OLS regression?
  
  thanks, dimitris.  I also added Bill Dunlap's solve(qr(x),y)
  function as ols5.   here is what I get in terms of speed on a Mac Pro:
  
  ols1 6.779 3.591 10.37 0 0
  ols2 0.515 0.21 0.725 0 0
  ols3 0.576 0.403 0.971 0 0
  ols4 1.143 1.251 2.395 0 0
  ols5 0.683 0.565 1.248 0 0
  
  so the naive matrix operations are fastest.  I would have thought that
  alternatives to the naive stuff I learned in my linear algebra course
  would be quicker.   still, ols3 and ols5 are competitive.  the
  built-in lm() is really problematic.  is ols3 (or perhaps even ols5)
  preferable in terms of accuracy?  I think I can deal with 20% speed
  slow-down (but not with a factor 10 speed slow-down).
  
  regards,
  
  /iaw
  
  
  On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos
  d.rizopou...@erasmusmc.nl wrote:
   check the following options:
  
   ols1 - function (y, x) {
      coef(lm(y ~ x - 1))
   }
  
   ols2 - function (y, x) {
      xy - t(x)%*%y
      xxi - solve(t(x)%*%x)
      b - as.vector(xxi%*%xy)
      b
   }
  
   ols3 - function (y, x) {
      XtX - crossprod(x)
      Xty - crossprod(x, y)
      solve(XtX, Xty)
   }
  
   ols4 - function (y, x) {
      lm.fit(x, y)$coefficients
   }
  
  
  __
  R-help@r-project.org mailing list
  
  PLEASE do read the posting guide 
  and provide commented, minimal, self-contained, reproducible code.
  
  __
  R-help@r-project.org mailing list
  
  PLEASE do read the posting guide 
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] very fast OLS regression?

2009-03-25 Thread Gavin Simpson
On Wed, 2009-03-25 at 16:28 -0400, ivo welch wrote:
 Dear R experts:
 
 I just tried some simple test that told me that hand computing the OLS
 coefficients is about 3-10 times as fast as using the built-in lm()
 function.   (code included below.)  Most of the time, I do not care,
 because I like the convenience, and I presume some of the time goes
 into saving a lot of stuff that I may or may not need.  But when I do
 want to learn the properties of an estimator whose input contains a
 regression, I do care about speed.
 
 What is the recommended fastest way to get regression coefficients in
 R?  (Is Gentlemen's weighted-least-squares algorithm implemented in a
 low-level C form somewhere?  that one was always lightning fast for
 me.)

No one has yet mentioned Doug Bates' article in R News on this topic,
which compares timings of various methods for least squares
computations.

Douglas Bates. Least squares calculations in R. R News, 4(1):17-20, June
2004.

A Cholesky decomposition solution proved fastest in base R code, with an
even faster version developed using sparse matrices and the Matrix
package.

You can find Doug's article here:

http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf

HTH

G

 
 regards,
 
 /ivo
 
 
 
 bybuiltin = function( y, x )   coef(lm( y ~ x -1 ));
 
 byhand = function( y, x ) {
   xy-t(x)%*%y;
   xxi- solve(t(x)%*%x)
   b-as.vector(xxi%*%xy)
   ## I will need these later, too:
   ## res-y-as.vector(x%*%b)
   ## soa[i]-b[2]
   ## sigmas[i]-sd(res)
   b;
 }
 
 
 MC=500;
 N=1;
 
 
 set.seed(0);
 x= matrix( rnorm(N*MC), nrow=N, ncol=MC );
 y= matrix( rnorm(N*MC), nrow=N, ncol=MC );
 
 ptm = proc.time()
 for (mc in 1:MC) byhand(y[,mc],x[,mc]);
 cat(By hand took , proc.time()-ptm, \n);
 
 ptm = proc.time()
 for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]);
 cat(By built-in took , proc.time()-ptm, \n);
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.