Hi Christoph, ginv() computes the Moore-Penrose generalized inverse by way of a singular value decomposition. Part of the calculation involves taking the reciprocal of the non zero values. In practice, non zero is really "within some precision tolerance of zero". Numerical precision can bite you in scientific computing.
There are many examples where the most conceptually straightforward approach is not the best approach because whereas the equation may be easy to write symbolically, it is more vulnerable to rounding or truncation errors that occur in floating point representations. Aside from working through some matrix algebra for understanding, using established code (like lm) for models where the authors will have taken issues like numerical precision and stability into consideration is generally safest. Cheers, Josh On Tue, Sep 3, 2013 at 6:22 AM, Christoph Scherber <christoph.scher...@agr.uni-goettingen.de> wrote: > Dear all, > > But why are there such huge differences betwen solve() and ginv()? (see code > below)? > > ## > m1=lm(Ozone~Solar.R*Wind,airquality) > > # remove NA´s: > airquality2=airquality[complete.cases(airquality$Ozone)& > complete.cases(airquality$Solar.R)& > complete.cases(airquality$Wind),] > > # create the model matrix by hand: > X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind) > # is the same as: > model.matrix(m1) > # create the response vector by hand: > Y=airquality2$Ozone > # is the same as: > m1$model$Ozone > # Now solve for the parameter estimates: > > solve(crossprod(X)) %*% crossprod(X,Y) #gives the correct answer > > library(MASS) > ginv(t(X)%*%X)%*%t(X)%*%Y #gives a wrong answer > > > > > > Am 03/09/2013 12:29, schrieb Joshua Wiley: >> Hi Christoph, >> >> Use this matrix expression instead: >> >> solve(crossprod(X)) %*% t(X) %*% Y >> >> Note that: >> >> all.equal(crossprod(X), t(X) %*% X) >> >> Cheers, >> >> Joshua >> >> >> >> On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber >> <christoph.scher...@agr.uni-goettingen.de> wrote: >>> Dear all, >>> >>> I´ve played around with the "airquality" dataset, trying to solve the >>> matrix equations of a simple >>> multiple regression by hand; however, my matrix multiplications don´t lead >>> to the estimates returned >>> by coef(). What have I done wrong here? >>> >>> ## >>> m1=lm(Ozone~Solar.R*Wind,airquality) >>> >>> # remove NA´s: >>> airquality2=airquality[complete.cases(airquality$Ozone)& >>> complete.cases(airquality$Solar.R)& >>> complete.cases(airquality$Wind),] >>> >>> # create the model matrix by hand: >>> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind) >>> # is the same as: >>> model.matrix(m1) >>> >>> # create the response vector by hand: >>> Y=airquality2$Ozone >>> # is the same as: >>> m1$model$Ozone >>> >>> # Now solve for the parameter estimates: >>> library(MASS) >>> ginv(t(X)%*%X)%*%t(X)%*%Y >>> >>> # is not the same as: >>> coef(m1) >>> >>> ## >>> Now why is my result (line ginv(...)) not the same as the one returned by >>> coef(m1)? >>> >>> Thanks very much for your help! >>> >>> Best regards, >>> Christoph >>> >>> [using R 3.0.1 on Windows 7 32-Bit] >>> >>> >>> >>> >>> >>> -- >>> PD Dr Christoph Scherber >>> Georg-August University Goettingen >>> Department of Crop Science >>> Agroecology >>> Grisebachstrasse 6 >>> D-37077 Goettingen >>> Germany >>> phone 0049 (0)551 39 8807 >>> fax 0049 (0)551 39 8806 >>> http://www.gwdg.de/~cscherb1 >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://joshuawiley.com/ Senior Analyst - Elkhart Group Ltd. http://elkhartgroup.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.