Re: [R] Simple lm/regression question

Achim Zeileis Mon, 06 Feb 2012 01:58:28 -0800

On Mon, 6 Feb 2012, James Annan wrote:

I am trying to use lm for a simple linear fit with weights. The resultsI get from IDL (which I am more familiar with) seem correct andintuitive, but the "lm" function in R gives outputs that seem strange tome.
Unweighted case:
x<-1:4
y<-(1:4)^2
summary(lm(y~x))
Call:
lm(formula = y ~ x)

Residuals:
1  2  3  4
1 -1 -1  1

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
(Intercept)  -5.0000     1.7321  -2.887   0.1020
x             5.0000     0.6325   7.906   0.0156 *

So far, so good - IDL does much the same:

IDL> vec=linfit(x,y,sigma=sig)
IDL> print,vec,sig
    -5.00000      5.00000
     1.73205     0.632456
Now, if the dependent variable has known (measurement) uncertainties (10,say) then it is appropriate to use weights defined as the inverse of thevariances, right?


If you think that in y = x'b + e the error has

E(e^2) = sigma^2 * w

then you should use

weights = 1/w

in R because that is how the weights enter the objective function

sum 1/w (y - x'b)^2

summary(lm(y~x,weights=c(.01,.01,.01,.01)))
Call:
lm(formula = y ~ x, weights = c(0.01, 0.01, 0.01, 0.01))

Residuals:
  1    2    3    4
0.1 -0.1 -0.1  0.1

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
(Intercept)  -5.0000     1.7321  -2.887   0.1020
x             5.0000     0.6325   7.906   0.0156 *
But here the *residuals* have changed and are no longer the actualresponse minus fitted values. (They seem to be scaled by the sqrt of theweights).

The summary() shows under "Residuals" the contributions to the objectivefunction, i.e. sqrt(1/w) (y - x'b) in the notation above.

However, by using the residuals extractor function you can get theunweighted residuals:


residuals(lm(y~x,weights=c(.01,.01,.01,.01)))

The uncertainties on the parameter estimates, however, have*not* changed, which seems very odd to me.


lm() interprets the weights as precision weights, not as case weights.

Thus, the scaling in the variances is done by the number of (non-zero)weights, not by the sum of weights.

The behaviour of IDL is rather different and intuitive to me:

IDL> vec=linfit(x,y,sigma=sig,measure_errors=[1,1,1,1])
IDL> print,vec,sig
    -5.00000      5.00000
     1.22474     0.447214

IDL> vec=linfit(x,y,sigma=sig,measure_errors=[10,10,10,10])
IDL> print,vec,sig
    -5.00000      5.00000
     12.2474      4.47214

This appears to use sandwich standard errors. If you load the sandwich andlmtest package you can do


coeftest(m, vcov = sandwich)

where 'm' is the fitted lm object.

Hope that helps,
Z

Note that the uncertainties are 10* larger when the errors are 10* larger.

My question is really how to replicate these results in R. I suspect there issome terminology or convention that I'm confused by. But in the lm help pagedocumentation, even the straightforward definition of "residual" seemsincompatible with what the code actually returns:


        residuals: the residuals, that is response minus fitted values.

But, in the weighted case, this is not actually true...

James
--
James D Annan jdan...@jamstec.go.jp Tel: +81-45-778-5618 (Fax 5707)
Senior Scientist, Research Institute for Global Change, JAMSTEC
Yokohama Institute for Earth Sciences, 3173-25 Showamachi,
Kanazawa-ku, Yokohama City, Kanagawa, 236-0001 Japan
http://www.jamstec.go.jp/frcgc/research/d5/jdannan/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Simple lm/regression question

Reply via email to