Re: [R] Interpreting lm Residuals...

David Winsemius Mon, 21 Jun 2010 09:11:45 -0700


On Jun 21, 2010, at 10:27 AM, David Riebel wrote:

I am using the lm function in R to fit several linear models to a
fair-sized dataset (~160 collections of ~1000 data points each).  My
data have intrinsic, systematic uncertainty much greater than the
measurement errors on any individual point.  My thought is to use the
residuals of my linear fits to quantify this intrinsic uncertainty,but
I am puzzled over the correct interpretation of R's output.

I have attached plots of the fit and the residuals to one of my
sub-groups, for illustration. By eye, the overwhelming majority ofthe
residuals are within +- 0.4, and I would therefore expect the standard
error of the residuals to be ~0.2. However, the output from lm doesnot
show this:

Crack open a basic regression text. The standard error (morecompletely, the standard error of the estimate) refers to theparameter, not the residuals. It will depend on SS(resid)/(n), butthere are obviously other components in the calculation. Furthermore,you have complicated matters by adding a weights term which willaffect your estimates in a manner that we cannot predict since you didnot provide the full data.

summary(ofit)


Call:
lm(formula = omag ~ oper, weights = (1/oerr))

Residuals:
    Min       1Q   Median       3Q      Max
-3.32185 -0.41181  0.03983  0.40041  2.52971

Coefficients:
           Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.52847    0.03979   490.8   <2e-16 ***
oper        -4.25297    0.02101  -202.4   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6705 on 2287 degrees of freedom
Multiple R-squared: 0.9471, Adjusted R-squared: 0.9471
F-statistic: 4.097e+04 on 1 and 2287 DF,  p-value: < 2.2e-16

The plot thickens when I examine the residuals themselves:

summary(resid(ofit))

    Min.   1st Qu.    Median      Mean   3rd Qu.      Max.
-0.611800 -0.095720  0.010200  0.005954  0.101100  0.680700

sd(resid(ofit))

[1] 0.1533568

These numbers are much more what I see by eye. There really aren'tany

residuals outside ~0.6, certainly nothing as large as 3.3!  The help
feature for lm tells me that the residuals are "the residuals, that is
response minus fitted values."  Exactly what I would expect.  As an
Astronomer, my knowledge of statistics is rather "workman-like" if you

will, but to me, "Residual standard error" means "the standarddeviation

of the residuals," but the lm output doesn't seem to agree with this.


Probably because you added the weights argument.


I'd appreciate it if someone could clarify what's being output by the
summary function acting on an lm object.

Replies by e-mail preferred.

Thanks,


David Riebel
Graduate Research Assistant
Johns Hopkins University
Department of Physics and Astronomy


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting lm Residuals...

Reply via email to