Yes, Bert. Any least-squares solution that forms X'X and then inverts it is not to be recommended. If X is nearly rank-deficient, then X'X will be more strongly so. The QR decomposition approach in my byhand.qr() function is reliable and fast.
Ravi. ____________________________________________________________________ Ravi Varadhan, Ph.D. Assistant Professor, Division of Geriatric Medicine and Gerontology School of Medicine Johns Hopkins University Ph. (410) 502-2619 email: rvarad...@jhmi.edu ----- Original Message ----- From: Bert Gunter <gunter.ber...@gene.com> Date: Wednesday, March 25, 2009 6:03 pm Subject: Re: [R] very fast OLS regression? To: 'ivo welch' <ivo...@gmail.com>, 'Dimitris Rizopoulos' <d.rizopou...@erasmusmc.nl> Cc: 'r-help' <r-h...@stat.math.ethz.ch> > lm is slow because it has to set up the design matrix (X) each time. See > ?model.matrix and ?model.matrix.lm for how to do this once > separately from > lm (and then use lm.fit). > > I am far from an expert on numerical algebra, but I'm pretty sure your > comments below are incorrect in the sense that "naive" methods are **not** > universally better then efficient numerical algebra methods > using,say, QR > decompositions. It will depend very strongly on the size and specific > nature > of the problem. Big Oh Complexity statements ( O(n) or O(n^2), etc.) > are, > after all, asymptotic. There is also typically a tradeoff between > computational accuracy (especially for ill-conditioned matrices) and > speed > which your remarks seem to neglect. > > -- Bert > > > Bert Gunter > Genentech Nonclinical Biostatistics > 650-467-7374 > > -----Original Message----- > From: r-help-boun...@r-project.org [ On > Behalf Of ivo welch > Sent: Wednesday, March 25, 2009 2:30 PM > To: Dimitris Rizopoulos > Cc: r-help > Subject: Re: [R] very fast OLS regression? > > thanks, dimitris. I also added Bill Dunlap's "solve(qr(x),y)" > function as ols5. here is what I get in terms of speed on a Mac Pro: > > ols1 6.779 3.591 10.37 0 0 > ols2 0.515 0.21 0.725 0 0 > ols3 0.576 0.403 0.971 0 0 > ols4 1.143 1.251 2.395 0 0 > ols5 0.683 0.565 1.248 0 0 > > so the naive matrix operations are fastest. I would have thought that > alternatives to the naive stuff I learned in my linear algebra course > would be quicker. still, ols3 and ols5 are competitive. the > built-in lm() is really problematic. is ols3 (or perhaps even ols5) > preferable in terms of accuracy? I think I can deal with 20% speed > slow-down (but not with a factor 10 speed slow-down). > > regards, > > /iaw > > > On Wed, Mar 25, 2009 at 5:11 PM, Dimitris Rizopoulos > <d.rizopou...@erasmusmc.nl> wrote: > > check the following options: > > > > ols1 <- function (y, x) { > > coef(lm(y ~ x - 1)) > > } > > > > ols2 <- function (y, x) { > > xy <- t(x)%*%y > > xxi <- solve(t(x)%*%x) > > b <- as.vector(xxi%*%xy) > > b > > } > > > > ols3 <- function (y, x) { > > XtX <- crossprod(x) > > Xty <- crossprod(x, y) > > solve(XtX, Xty) > > } > > > > ols4 <- function (y, x) { > > lm.fit(x, y)$coefficients > > } > > > > ______________________________________________ > R-help@r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > > PLEASE do read the posting guide > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.