Hello,
It's easy to see what's going on by reading the sources, to be open
source is one of the strong points of R, we know exactly how the values
are computed. A reviewer might like to have an explanation of what R does.
The op could check with Friedrich Leisch's "Creating R Packages: A
Tutorial", it's running example on S3 classes is precisely the linear
model. The relevant functions and the way to call them are as follows.
Note that the p-values are computed using the distribution function,
pt(), that gives the area under the density, and that the returned
values are multiplied by two, since the test is two-sided. I've edited
the code a bit, to give an other way of computing the p-values. The
results are the same as the results of R's summary.lm() in package stats
and the code is easy to follow.
linmodEst <- function(x, y){
## compute QR-decomposition of x
qx <- qr(x)
## compute (x'x)^(-1) x'y
coef <- solve.qr(qx, y)
## degrees of freedom and standard deviation of residuals
df <- nrow(x) - ncol(x)
sigma2 <- sum((y - x %*% coef)^2)/df
## compute sigma^2 * (x'x)^-1
vcov <- sigma2 * chol2inv(qx$qr)
colnames(vcov) <- rownames(vcov) <- colnames(x)
list(coefficients = coef,
vcov = vcov,
sigma = sqrt(sigma2),
df = df)
}
summary.linmod <- function(object, ...){
se <- sqrt(diag(object$vcov))
tval <- coef(object) / se
TAB <- cbind(Estimate = coef(object),
StdErr = se,
t.value = tval,
p.value = 2*pt(-abs(tval), df=object$df),
p.value2 = 2*pt(abs(tval), df=object$df, lower.tail = FALSE))
res <- list(call=object$call,
coefficients=TAB)
#class(res) <- "summary.linmod"
res
}
mod <- linmodEst(cbind(Const = 1, x_var), y_var)
summary.linmod(mod)
Hope this helps,
Rui Barradas
Em 03-12-2012 14:26, Robert Baer escreveu:
On 12/3/2012 6:20 AM, Sindri wrote:
Dear R-users
Please excuse me if this topic has been covered before, but I was
unable to
find anything relevant by searching
I am currently doing a comparison of two biological variables that
have a
highly significant linear relationship. I know that the p-value of
linear
regression is not so interesting in itself, but this particular value
does
raise a question.
How does R calculate (extremely low) p-values for linear regression?
For my data I got a p-value on the order of 10^-9 and a reviewer
commented
on this. I tried to run the same analysis in both SAS and Sigmastat
to be
sure that I was doing it right, but both these programs only return a
p-value of p < 0.0001
Since I am unable to reproduce my results in another statistics
program, it
would be nice to be able to explain this unusally low p-value to the
reviewers.
This is a matter of you understanding that the p-value is an area
under a probability density curve. R is simply printing out the
actual area in a tail of some distribution. The other statistical
program is making the assumption that you are using the p-value to
compare to a cutoff alpha value that is (in most fields) never set
much below p<0.001. If p < alpha the "hypothesis test crowd" , would
choose to reject NULL hypothesis, so the other statistics programs
take the attitude -- "why provide more detail?". R chooses to give
you the actual number and let you do what you will with it. You could
probably benefit from reviewing hypothesis testing in a basic
statistics book if this is not clear.
Note that 10e-9 is indeed less than 0.0001, so the programs don't
disagree. R just provides more detail.
This "problem" can be illustrated with the following made-up data:
x_var<-c(0.149,0.178,0.3474,0.167,0.121,0.182,0.176,0.448,0.091,0.083,0.090,0.407,0.378,0.132,0.227,0.172,0.088,0.392,0.425,0.150,0.319,0.190,0.171,0.290,0.214,0.431,0.193)
y_var<-c(0.918,0.394,0.131,0.9084,0.916,0.934,0.928,0.279,0.830,0.927,0.964,0.323,0.097,0.914,0.614,0.790,0.984,0.530,0.207,0.858,0.408,0.919,0.869,0.347,0.834,0.276,0.940)
fit<-lm(y_var~x_var)
summary(fit)
Call:
lm(formula = y_var ~ x_var)
Residuals:
Min 1Q Median 3Q Max
-0.39152 -0.06027 0.00933 0.10024 0.22711
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.18696 0.06394 18.562 3.90e-16 ***
x_var -2.25529 0.24788 -9.098 2.08e-09 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1503 on 25 degrees of freedom
Multiple R-squared: 0.768, Adjusted R-squared: 0.7588
F-statistic: 82.78 on 1 and 25 DF, p-value: 2.083e-09
With kind regards,
Sindri Traustason
-----
-----------------------------------------
Sindri Traustason
Glostrup Hospital Ophthalmology Research Dept.
Copenhagen, Demark
--
View this message in context:
http://r.789695.n4.nabble.com/Calculation-of-extremely-low-p-values-in-lm-tp4651823.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.