Re: [R] very fast OLS regression?
On Wed, Mar 25, 2009 at 5:15 PM, Gavin Simpson gavin.simp...@ucl.ac.uk wrote: On Wed, 2009-03-25 at 16:28 -0400, ivo welch wrote: Dear R experts: I just tried some simple test that told me that hand computing the OLS coefficients is about 3-10 times as fast as using the built-in lm() function. (code included below.) Most of the time, I do not care, because I like the convenience, and I presume some of the time goes into saving a lot of stuff that I may or may not need. But when I do want to learn the properties of an estimator whose input contains a regression, I do care about speed. What is the recommended fastest way to get regression coefficients in R? (Is Gentlemen's weighted-least-squares algorithm implemented in a low-level C form somewhere? that one was always lightning fast for me.) No one has yet mentioned Doug Bates' article in R News on this topic, which compares timings of various methods for least squares computations. Douglas Bates. Least squares calculations in R. R News, 4(1):17-20, June 2004. A Cholesky decomposition solution proved fastest in base R code, with an even faster version developed using sparse matrices and the Matrix package. You can find Doug's article here: http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf An updated version is available as the Comparisons vignette in the Matrix package. regards, /ivo bybuiltin = function( y, x ) coef(lm( y ~ x -1 )); byhand = function( y, x ) { xy-t(x)%*%y; xxi- solve(t(x)%*%x) b-as.vector(xxi%*%xy) ## I will need these later, too: ## res-y-as.vector(x%*%b) ## soa[i]-b[2] ## sigmas[i]-sd(res) b; } MC=500; N=1; set.seed(0); x= matrix( rnorm(N*MC), nrow=N, ncol=MC ); y= matrix( rnorm(N*MC), nrow=N, ncol=MC ); ptm = proc.time() for (mc in 1:MC) byhand(y[,mc],x[,mc]); cat(By hand took , proc.time()-ptm, \n); ptm = proc.time() for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]); cat(By built-in took , proc.time()-ptm, \n); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
thanks everybody. I also just read Dirk E's high-performance computing tutorial. now I wonder: would it be faster to compile a C version of Gentleman's algorithm for WLS into R? before I waste a few days trying to program this in and getting it all to work together, would the end result likely be a good speedup, or more likely be a waste of my time? any hunches? regards, /iaw [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
On Wed, 25 Mar 2009, ivo welch wrote: thanks, dimitris. I also added Bill Dunlap's solve(qr(x),y) function as ols5. here is what I get in terms of speed on a Mac Pro: ols1 6.779 3.591 10.37 0 0 ols2 0.515 0.21 0.725 0 0 ols3 0.576 0.403 0.971 0 0 ols4 1.143 1.251 2.395 0 0 ols5 0.683 0.565 1.248 0 0 so the naive matrix operations are fastest. I would have thought that alternatives to the naive stuff I learned in my linear algebra course would be quicker. still, ols3 and ols5 are competitive. the built-in lm() is really problematic. is ols3 (or perhaps even ols5) preferable in terms of accuracy? I think I can deal with 20% speed slow-down (but not with a factor 10 speed slow-down). ols5 is more accurate in terms of round-off error, and so it is how the internal computations are done in lm and lsfit, but it is relatively unusual for this to matter given double precision. If you want to repeat the regression with the same x and many y you can do much better by taking the QR decomposition just once and applying to a matrix of all the ys. It's also possible that using chol() and backsolve() rather than solve() would speed up ols2 and ols3, at least for large enough matrices. -thomas regards, /iaw On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: check the following options: ols1 - function (y, x) { coef(lm(y ~ x - 1)) } ols2 - function (y, x) { xy - t(x)%*%y xxi - solve(t(x)%*%x) b - as.vector(xxi%*%xy) b } ols3 - function (y, x) { XtX - crossprod(x) Xty - crossprod(x, y) solve(XtX, Xty) } ols4 - function (y, x) { lm.fit(x, y)$coefficients } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
On Wed, 25 Mar 2009, Ravi Varadhan wrote: Yes, Bert. Any least-squares solution that forms X'X and then inverts it is not to be recommended. If X is nearly rank-deficient, then X'X will be more strongly so. The QR decomposition approach in my byhand.qr() function is reliable and fast. Forming the matrix of crossproducts and using cholesky decomposition is faster, so it does depend on the intended use. In a simulation, the OP's situation, you may well know that X is not nearly rank deficient, in which case the speed advantage may be worthwhile. After all, even if the condition number of X is 10^5 you will still have five or six accurate digits in the result. If you are writing code that will be used in ways you have no control over, then of course it makes sense to use the more stable QR decomposition. -thomas Thomas Lumley Assoc. Professor, Biostatistics tlum...@u.washington.eduUniversity of Washington, Seattle __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
On Wed, 2009-03-25 at 22:11 +0100, Dimitris Rizopoulos wrote: check the following options: ols1 - function (y, x) { coef(lm(y ~ x - 1)) } ols2 - function (y, x) { xy - t(x)%*%y xxi - solve(t(x)%*%x) b - as.vector(xxi%*%xy) b } ols3 - function (y, x) { XtX - crossprod(x) Xty - crossprod(x, y) solve(XtX, Xty) } ols4 - function (y, x) { lm.fit(x, y)$coefficients } # check timings MC - 500 N - 1 set.seed(0) x - matrix(rnorm(N*MC), N, MC) y - matrix(rnorm(N*MC), N, MC) invisible({gc(); gc(); gc()}) system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc])) invisible({gc(); gc(); gc()}) system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc])) invisible({gc(); gc(); gc()}) system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc])) invisible({gc(); gc(); gc()}) system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE])) Hi Dimitris and Ivo I think this is not a fair comparison, look this x[8,100]-NA system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc])) user system elapsed 8.765 0.000 8.762 system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc])) Error in solve.default(t(x) %*% x) : system is computationally singular: reciprocal condition number = 0 Timing stopped at: 0 0 0.002 system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc])) Error in solve.default(XtX, Xty) : system is computationally singular: reciprocal condition number = 0 Timing stopped at: 0 0 0.001 system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE])) Error in lm.fit(x, y) : NA/NaN/Inf in foreign function call (arg 1) Timing stopped at: 0 0 0.001 So routines ols2, ols3 and ols4 only functional in fully matrix if have one NA this functions don't run. -- Bernardo Rangel Tura, M.D,MPH,Ph.D National Institute of Cardiology Brazil __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] very fast OLS regression?
Dear R experts: I just tried some simple test that told me that hand computing the OLS coefficients is about 3-10 times as fast as using the built-in lm() function. (code included below.) Most of the time, I do not care, because I like the convenience, and I presume some of the time goes into saving a lot of stuff that I may or may not need. But when I do want to learn the properties of an estimator whose input contains a regression, I do care about speed. What is the recommended fastest way to get regression coefficients in R? (Is Gentlemen's weighted-least-squares algorithm implemented in a low-level C form somewhere? that one was always lightning fast for me.) regards, /ivo bybuiltin = function( y, x ) coef(lm( y ~ x -1 )); byhand = function( y, x ) { xy-t(x)%*%y; xxi- solve(t(x)%*%x) b-as.vector(xxi%*%xy) ## I will need these later, too: ## res-y-as.vector(x%*%b) ## soa[i]-b[2] ## sigmas[i]-sd(res) b; } MC=500; N=1; set.seed(0); x= matrix( rnorm(N*MC), nrow=N, ncol=MC ); y= matrix( rnorm(N*MC), nrow=N, ncol=MC ); ptm = proc.time() for (mc in 1:MC) byhand(y[,mc],x[,mc]); cat(By hand took , proc.time()-ptm, \n); ptm = proc.time() for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]); cat(By built-in took , proc.time()-ptm, \n); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
check the following options: ols1 - function (y, x) { coef(lm(y ~ x - 1)) } ols2 - function (y, x) { xy - t(x)%*%y xxi - solve(t(x)%*%x) b - as.vector(xxi%*%xy) b } ols3 - function (y, x) { XtX - crossprod(x) Xty - crossprod(x, y) solve(XtX, Xty) } ols4 - function (y, x) { lm.fit(x, y)$coefficients } # check timings MC - 500 N - 1 set.seed(0) x - matrix(rnorm(N*MC), N, MC) y - matrix(rnorm(N*MC), N, MC) invisible({gc(); gc(); gc()}) system.time(for (mc in 1:MC) ols1(y[, mc], x[, mc])) invisible({gc(); gc(); gc()}) system.time(for (mc in 1:MC) ols2(y[, mc], x[, mc])) invisible({gc(); gc(); gc()}) system.time(for (mc in 1:MC) ols3(y[, mc], x[, mc])) invisible({gc(); gc(); gc()}) system.time(for (mc in 1:MC) ols4(y[, mc], x[, mc, drop = FALSE])) I hope it helps. Best, Dimitris ivo welch wrote: Dear R experts: I just tried some simple test that told me that hand computing the OLS coefficients is about 3-10 times as fast as using the built-in lm() function. (code included below.) Most of the time, I do not care, because I like the convenience, and I presume some of the time goes into saving a lot of stuff that I may or may not need. But when I do want to learn the properties of an estimator whose input contains a regression, I do care about speed. What is the recommended fastest way to get regression coefficients in R? (Is Gentlemen's weighted-least-squares algorithm implemented in a low-level C form somewhere? that one was always lightning fast for me.) regards, /ivo bybuiltin = function( y, x ) coef(lm( y ~ x -1 )); byhand = function( y, x ) { xy-t(x)%*%y; xxi- solve(t(x)%*%x) b-as.vector(xxi%*%xy) ## I will need these later, too: ## res-y-as.vector(x%*%b) ## soa[i]-b[2] ## sigmas[i]-sd(res) b; } MC=500; N=1; set.seed(0); x= matrix( rnorm(N*MC), nrow=N, ncol=MC ); y= matrix( rnorm(N*MC), nrow=N, ncol=MC ); ptm = proc.time() for (mc in 1:MC) byhand(y[,mc],x[,mc]); cat(By hand took , proc.time()-ptm, \n); ptm = proc.time() for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]); cat(By built-in took , proc.time()-ptm, \n); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dimitris Rizopoulos Assistant Professor Department of Biostatistics Erasmus University Medical Center Address: PO Box 2040, 3000 CA Rotterdam, the Netherlands Tel: +31/(0)10/7043478 Fax: +31/(0)10/7043014 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
thanks, dimitris. I also added Bill Dunlap's solve(qr(x),y) function as ols5. here is what I get in terms of speed on a Mac Pro: ols1 6.779 3.591 10.37 0 0 ols2 0.515 0.21 0.725 0 0 ols3 0.576 0.403 0.971 0 0 ols4 1.143 1.251 2.395 0 0 ols5 0.683 0.565 1.248 0 0 so the naive matrix operations are fastest. I would have thought that alternatives to the naive stuff I learned in my linear algebra course would be quicker. still, ols3 and ols5 are competitive. the built-in lm() is really problematic. is ols3 (or perhaps even ols5) preferable in terms of accuracy? I think I can deal with 20% speed slow-down (but not with a factor 10 speed slow-down). regards, /iaw On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: check the following options: ols1 - function (y, x) { coef(lm(y ~ x - 1)) } ols2 - function (y, x) { xy - t(x)%*%y xxi - solve(t(x)%*%x) b - as.vector(xxi%*%xy) b } ols3 - function (y, x) { XtX - crossprod(x) Xty - crossprod(x, y) solve(XtX, Xty) } ols4 - function (y, x) { lm.fit(x, y)$coefficients } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
Ivo, I would be careful about dealing with situations where x may be numerically ill-conditioned. Your code will do quite badly in such cases. Here is an extreme illustration: set.seed(123) x - matrix(rnorm(20), 10, 2) x - cbind(x, x[,1]) # x is clearly rank-deficient y - c(x %*% c(1,2,3)) byhand(y, x) Error in solve.default(t(x) %*% x) : Lapack routine dgesv: system is exactly singular A better approach would be to explicitly use QR decomposition in your byhand() function: byhand.qr = function( y, x, tol=1.e-07 ) { qr.x - qr(x, tol=tol, LAPACK=TRUE) b-qr.coef(qr.x, y) ## I will need these later, too: ## res- qr.res(qr.x, y) ## soa[i]-b[2] ## sigmas[i]-sd(res) b; } ans - byhand.qr(y, x) ans [1] 1.795725 2.00 2.204275 You can verify that this a valid solution: y - c(x %*% ans) [1] 9.436896e-16 2.498002e-16 -8.881784e-16 0.00e+00 -4.440892e-16 1.776357e-15 4.440892e-16 0.00e+00 4.440892e-16 -4.440892e-16 However, this is not the unique solution, since there an infinite number of solutions. We can get the solution that has the minimum length by using Moore-Penrose inverse: require(MASS) ans.min - c(ginv(x) %*% y) ans.min [1] 2 2 2 You can verify that ans.min has a smaller L2-norm than ans, and it is unique. Hope this helps, Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu - Original Message - From: ivo welch ivo...@gmail.com Date: Wednesday, March 25, 2009 4:31 pm Subject: [R] very fast OLS regression? To: r-help r-h...@stat.math.ethz.ch Dear R experts: I just tried some simple test that told me that hand computing the OLS coefficients is about 3-10 times as fast as using the built-in lm() function. (code included below.) Most of the time, I do not care, because I like the convenience, and I presume some of the time goes into saving a lot of stuff that I may or may not need. But when I do want to learn the properties of an estimator whose input contains a regression, I do care about speed. What is the recommended fastest way to get regression coefficients in R? (Is Gentlemen's weighted-least-squares algorithm implemented in a low-level C form somewhere? that one was always lightning fast for me.) regards, /ivo bybuiltin = function( y, x ) coef(lm( y ~ x -1 )); byhand = function( y, x ) { xy-t(x)%*%y; xxi- solve(t(x)%*%x) b-as.vector(xxi%*%xy) ## I will need these later, too: ## res-y-as.vector(x%*%b) ## soa[i]-b[2] ## sigmas[i]-sd(res) b; } MC=500; N=1; set.seed(0); x= matrix( rnorm(N*MC), nrow=N, ncol=MC ); y= matrix( rnorm(N*MC), nrow=N, ncol=MC ); ptm = proc.time() for (mc in 1:MC) byhand(y[,mc],x[,mc]); cat(By hand took , proc.time()-ptm, \n); ptm = proc.time() for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]); cat(By built-in took , proc.time()-ptm, \n); __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
lm is slow because it has to set up the design matrix (X) each time. See ?model.matrix and ?model.matrix.lm for how to do this once separately from lm (and then use lm.fit). I am far from an expert on numerical algebra, but I'm pretty sure your comments below are incorrect in the sense that naive methods are **not** universally better then efficient numerical algebra methods using,say, QR decompositions. It will depend very strongly on the size and specific nature of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) are, after all, asymptotic. There is also typically a tradeoff between computational accuracy (especially for ill-conditioned matrices) and speed which your remarks seem to neglect. -- Bert Bert Gunter Genentech Nonclinical Biostatistics 650-467-7374 -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ivo welch Sent: Wednesday, March 25, 2009 2:30 PM To: Dimitris Rizopoulos Cc: r-help Subject: Re: [R] very fast OLS regression? thanks, dimitris. I also added Bill Dunlap's solve(qr(x),y) function as ols5. here is what I get in terms of speed on a Mac Pro: ols1 6.779 3.591 10.37 0 0 ols2 0.515 0.21 0.725 0 0 ols3 0.576 0.403 0.971 0 0 ols4 1.143 1.251 2.395 0 0 ols5 0.683 0.565 1.248 0 0 so the naive matrix operations are fastest. I would have thought that alternatives to the naive stuff I learned in my linear algebra course would be quicker. still, ols3 and ols5 are competitive. the built-in lm() is really problematic. is ols3 (or perhaps even ols5) preferable in terms of accuracy? I think I can deal with 20% speed slow-down (but not with a factor 10 speed slow-down). regards, /iaw On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: check the following options: ols1 - function (y, x) { coef(lm(y ~ x - 1)) } ols2 - function (y, x) { xy - t(x)%*%y xxi - solve(t(x)%*%x) b - as.vector(xxi%*%xy) b } ols3 - function (y, x) { XtX - crossprod(x) Xty - crossprod(x, y) solve(XtX, Xty) } ols4 - function (y, x) { lm.fit(x, y)$coefficients } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
Yes, Bert. Any least-squares solution that forms X'X and then inverts it is not to be recommended. If X is nearly rank-deficient, then X'X will be more strongly so. The QR decomposition approach in my byhand.qr() function is reliable and fast. Ravi. Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu - Original Message - From: Bert Gunter gunter.ber...@gene.com Date: Wednesday, March 25, 2009 6:03 pm Subject: Re: [R] very fast OLS regression? To: 'ivo welch' ivo...@gmail.com, 'Dimitris Rizopoulos' d.rizopou...@erasmusmc.nl Cc: 'r-help' r-h...@stat.math.ethz.ch lm is slow because it has to set up the design matrix (X) each time. See ?model.matrix and ?model.matrix.lm for how to do this once separately from lm (and then use lm.fit). I am far from an expert on numerical algebra, but I'm pretty sure your comments below are incorrect in the sense that naive methods are **not** universally better then efficient numerical algebra methods using,say, QR decompositions. It will depend very strongly on the size and specific nature of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) are, after all, asymptotic. There is also typically a tradeoff between computational accuracy (especially for ill-conditioned matrices) and speed which your remarks seem to neglect. -- Bert Bert Gunter Genentech Nonclinical Biostatistics 650-467-7374 -Original Message- From: r-help-boun...@r-project.org [ On Behalf Of ivo welch Sent: Wednesday, March 25, 2009 2:30 PM To: Dimitris Rizopoulos Cc: r-help Subject: Re: [R] very fast OLS regression? thanks, dimitris. I also added Bill Dunlap's solve(qr(x),y) function as ols5. here is what I get in terms of speed on a Mac Pro: ols1 6.779 3.591 10.37 0 0 ols2 0.515 0.21 0.725 0 0 ols3 0.576 0.403 0.971 0 0 ols4 1.143 1.251 2.395 0 0 ols5 0.683 0.565 1.248 0 0 so the naive matrix operations are fastest. I would have thought that alternatives to the naive stuff I learned in my linear algebra course would be quicker. still, ols3 and ols5 are competitive. the built-in lm() is really problematic. is ols3 (or perhaps even ols5) preferable in terms of accuracy? I think I can deal with 20% speed slow-down (but not with a factor 10 speed slow-down). regards, /iaw On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos d.rizopou...@erasmusmc.nl wrote: check the following options: ols1 - function (y, x) { coef(lm(y ~ x - 1)) } ols2 - function (y, x) { xy - t(x)%*%y xxi - solve(t(x)%*%x) b - as.vector(xxi%*%xy) b } ols3 - function (y, x) { XtX - crossprod(x) Xty - crossprod(x, y) solve(XtX, Xty) } ols4 - function (y, x) { lm.fit(x, y)$coefficients } __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list PLEASE do read the posting guide and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] very fast OLS regression?
On Wed, 2009-03-25 at 16:28 -0400, ivo welch wrote: Dear R experts: I just tried some simple test that told me that hand computing the OLS coefficients is about 3-10 times as fast as using the built-in lm() function. (code included below.) Most of the time, I do not care, because I like the convenience, and I presume some of the time goes into saving a lot of stuff that I may or may not need. But when I do want to learn the properties of an estimator whose input contains a regression, I do care about speed. What is the recommended fastest way to get regression coefficients in R? (Is Gentlemen's weighted-least-squares algorithm implemented in a low-level C form somewhere? that one was always lightning fast for me.) No one has yet mentioned Doug Bates' article in R News on this topic, which compares timings of various methods for least squares computations. Douglas Bates. Least squares calculations in R. R News, 4(1):17-20, June 2004. A Cholesky decomposition solution proved fastest in base R code, with an even faster version developed using sparse matrices and the Matrix package. You can find Doug's article here: http://cran.r-project.org/doc/Rnews/Rnews_2004-1.pdf HTH G regards, /ivo bybuiltin = function( y, x ) coef(lm( y ~ x -1 )); byhand = function( y, x ) { xy-t(x)%*%y; xxi- solve(t(x)%*%x) b-as.vector(xxi%*%xy) ## I will need these later, too: ## res-y-as.vector(x%*%b) ## soa[i]-b[2] ## sigmas[i]-sd(res) b; } MC=500; N=1; set.seed(0); x= matrix( rnorm(N*MC), nrow=N, ncol=MC ); y= matrix( rnorm(N*MC), nrow=N, ncol=MC ); ptm = proc.time() for (mc in 1:MC) byhand(y[,mc],x[,mc]); cat(By hand took , proc.time()-ptm, \n); ptm = proc.time() for (mc in 1:MC) bybuiltin(y[,mc],x[,mc]); cat(By built-in took , proc.time()-ptm, \n); __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Dr. Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.