Hi Christoph,

ginv() computes the Moore-Penrose generalized inverse by way of a
singular value decomposition.  Part of the calculation involves taking
the reciprocal of the non zero values.  In practice, non zero is
really "within some precision tolerance of zero".  Numerical precision
can bite you in scientific computing.

There are many examples where the most conceptually straightforward
approach is not the best approach because whereas the equation may be
easy to write symbolically, it is more vulnerable to rounding or
truncation errors that occur in floating point representations.

Aside from working through some matrix algebra for understanding,
using established code (like lm) for models where the authors will
have taken issues like numerical precision and stability into
consideration is generally safest.

Cheers,

Josh



On Tue, Sep 3, 2013 at 6:22 AM, Christoph Scherber
<christoph.scher...@agr.uni-goettingen.de> wrote:
> Dear all,
>
> But why are there such huge differences betwen solve() and ginv()? (see code 
> below)?
>
> ##
> m1=lm(Ozone~Solar.R*Wind,airquality)
>
> # remove NA´s:
> airquality2=airquality[complete.cases(airquality$Ozone)&
> complete.cases(airquality$Solar.R)&
> complete.cases(airquality$Wind),]
>
> # create the model matrix by hand:
> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
> # is the same as:
> model.matrix(m1)
> # create the response vector by hand:
> Y=airquality2$Ozone
> # is the same as:
> m1$model$Ozone
> # Now solve for the parameter estimates:
>
> solve(crossprod(X)) %*% crossprod(X,Y) #gives the correct answer
>
> library(MASS)
> ginv(t(X)%*%X)%*%t(X)%*%Y #gives a wrong answer
>
>
>
>
>
> Am 03/09/2013 12:29, schrieb Joshua Wiley:
>> Hi Christoph,
>>
>> Use this matrix expression instead:
>>
>> solve(crossprod(X)) %*% t(X) %*% Y
>>
>> Note that:
>>
>> all.equal(crossprod(X), t(X) %*% X)
>>
>> Cheers,
>>
>> Joshua
>>
>>
>>
>> On Tue, Sep 3, 2013 at 2:51 AM, Christoph Scherber
>> <christoph.scher...@agr.uni-goettingen.de> wrote:
>>> Dear all,
>>>
>>> I´ve played around with the "airquality" dataset, trying to solve the 
>>> matrix equations of a simple
>>> multiple regression by hand; however, my matrix multiplications don´t lead 
>>> to the estimates returned
>>> by coef(). What have I done wrong here?
>>>
>>> ##
>>> m1=lm(Ozone~Solar.R*Wind,airquality)
>>>
>>> # remove NA´s:
>>> airquality2=airquality[complete.cases(airquality$Ozone)&
>>> complete.cases(airquality$Solar.R)&
>>> complete.cases(airquality$Wind),]
>>>
>>> # create the model matrix by hand:
>>> X=cbind("(Intercept)"=1,Solar.R=airquality2$Solar.R,Wind=airquality2$Wind,"Solar.R:Wind"=airquality2$Solar.R*airquality2$Wind)
>>> # is the same as:
>>> model.matrix(m1)
>>>
>>> # create the response vector by hand:
>>> Y=airquality2$Ozone
>>> # is the same as:
>>> m1$model$Ozone
>>>
>>> # Now solve for the parameter estimates:
>>> library(MASS)
>>> ginv(t(X)%*%X)%*%t(X)%*%Y
>>>
>>> # is not the same as:
>>> coef(m1)
>>>
>>> ##
>>> Now why is my result (line ginv(...)) not the same as the one returned by 
>>> coef(m1)?
>>>
>>> Thanks very much for your help!
>>>
>>> Best regards,
>>> Christoph
>>>
>>> [using R 3.0.1 on Windows 7 32-Bit]
>>>
>>>
>>>
>>>
>>>
>>> --
>>> PD Dr Christoph Scherber
>>> Georg-August University Goettingen
>>> Department of Crop Science
>>> Agroecology
>>> Grisebachstrasse 6
>>> D-37077 Goettingen
>>> Germany
>>> phone 0049 (0)551 39 8807
>>> fax 0049 (0)551 39 8806
>>> http://www.gwdg.de/~cscherb1
>>>
>>> ______________________________________________
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://joshuawiley.com/
Senior Analyst - Elkhart Group Ltd.
http://elkhartgroup.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to