[R] Simple lm/regression question

James Annan Sun, 05 Feb 2012 20:53:02 -0800

I am trying to use lm for a simple linear fit with weights. The resultsI get from IDL (which I am more familiar with) seem correct andintuitive, but the "lm" function in R gives outputs that seem strange to me.


Unweighted case:


> x<-1:4
> y<-(1:4)^2
> summary(lm(y~x))

Call:
lm(formula = y ~ x)

Residuals:
 1  2  3  4
 1 -1 -1  1

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -5.0000     1.7321  -2.887   0.1020
x             5.0000     0.6325   7.906   0.0156 *

So far, so good - IDL does much the same:

IDL> vec=linfit(x,y,sigma=sig)
IDL> print,vec,sig
     -5.00000      5.00000
      1.73205     0.632456

Now, if the dependent variable has known (measurement) uncertainties(10, say) then it is appropriate to use weights defined as the inverseof the variances, right?


> summary(lm(y~x,weights=c(.01,.01,.01,.01)))

Call:
lm(formula = y ~ x, weights = c(0.01, 0.01, 0.01, 0.01))

Residuals:
   1    2    3    4
 0.1 -0.1 -0.1  0.1

Coefficients:
            Estimate Std. Error t value Pr(>|t|)
(Intercept)  -5.0000     1.7321  -2.887   0.1020
x             5.0000     0.6325   7.906   0.0156 *

But here the *residuals* have changed and are no longer the actualresponse minus fitted values. (They seem to be scaled by the sqrt of theweights). The uncertainties on the parameter estimates, however, have*not* changed, which seems very odd to me.


The behaviour of IDL is rather different and intuitive to me:

IDL> vec=linfit(x,y,sigma=sig,measure_errors=[1,1,1,1])
IDL> print,vec,sig
     -5.00000      5.00000
      1.22474     0.447214

IDL> vec=linfit(x,y,sigma=sig,measure_errors=[10,10,10,10])
IDL> print,vec,sig
     -5.00000      5.00000
      12.2474      4.47214

Note that the uncertainties are 10* larger when the errors are 10* larger.

My question is really how to replicate these results in R. I suspectthere is some terminology or convention that I'm confused by. But in thelm help page documentation, even the straightforward definition of"residual" seems incompatible with what the code actually returns:


        residuals: the residuals, that is response minus fitted values.

But, in the weighted case, this is not actually true...

James
--
James D Annan jdan...@jamstec.go.jp Tel: +81-45-778-5618 (Fax 5707)
Senior Scientist, Research Institute for Global Change, JAMSTEC
Yokohama Institute for Earth Sciences, 3173-25 Showamachi,
Kanazawa-ku, Yokohama City, Kanagawa, 236-0001 Japan
http://www.jamstec.go.jp/frcgc/research/d5/jdannan/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Simple lm/regression question

Reply via email to