On Jun 21, 2010, at 10:27 AM, David Riebel wrote:
I am using the lm function in R to fit several linear models to a
fair-sized dataset (~160 collections of ~1000 data points each). My
data have intrinsic, systematic uncertainty much greater than the
measurement errors on any individual point. My thought is to use the
residuals of my linear fits to quantify this intrinsic uncertainty,
but
I am puzzled over the correct interpretation of R's output.
I have attached plots of the fit and the residuals to one of my
sub-groups, for illustration. By eye, the overwhelming majority of
the
residuals are within +- 0.4, and I would therefore expect the standard
error of the residuals to be ~0.2. However, the output from lm does
not
show this:
Crack open a basic regression text. The standard error (more
completely, the standard error of the estimate) refers to the
parameter, not the residuals. It will depend on SS(resid)/(n), but
there are obviously other components in the calculation. Furthermore,
you have complicated matters by adding a weights term which will
affect your estimates in a manner that we cannot predict since you did
not provide the full data.
summary(ofit)
Call:
lm(formula = omag ~ oper, weights = (1/oerr))
Residuals:
Min 1Q Median 3Q Max
-3.32185 -0.41181 0.03983 0.40041 2.52971
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 19.52847 0.03979 490.8 <2e-16 ***
oper -4.25297 0.02101 -202.4 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.6705 on 2287 degrees of freedom
Multiple R-squared: 0.9471, Adjusted R-squared: 0.9471
F-statistic: 4.097e+04 on 1 and 2287 DF, p-value: < 2.2e-16
The plot thickens when I examine the residuals themselves:
summary(resid(ofit))
Min. 1st Qu. Median Mean 3rd Qu. Max.
-0.611800 -0.095720 0.010200 0.005954 0.101100 0.680700
sd(resid(ofit))
[1] 0.1533568
These numbers are much more what I see by eye. There really aren't
any
residuals outside ~0.6, certainly nothing as large as 3.3! The help
feature for lm tells me that the residuals are "the residuals, that is
response minus fitted values." Exactly what I would expect. As an
Astronomer, my knowledge of statistics is rather "workman-like" if you
will, but to me, "Residual standard error" means "the standard
deviation
of the residuals," but the lm output doesn't seem to agree with this.
Probably because you added the weights argument.
I'd appreciate it if someone could clarify what's being output by the
summary function acting on an lm object.
Replies by e-mail preferred.
Thanks,
David Riebel
Graduate Research Assistant
Johns Hopkins University
Department of Physics and Astronomy
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.