Re: [R] Calculation of extremely low p-values (in lm)

Rui Barradas Mon, 03 Dec 2012 07:03:19 -0800

Hello,

It's easy to see what's going on by reading the sources, to be opensource is one of the strong points of R, we know exactly how the valuesare computed. A reviewer might like to have an explanation of what R does.The op could check with Friedrich Leisch's "Creating R Packages: ATutorial", it's running example on S3 classes is precisely the linearmodel. The relevant functions and the way to call them are as follows.Note that the p-values are computed using the distribution function,pt(), that gives the area under the density, and that the returnedvalues are multiplied by two, since the test is two-sided. I've editedthe code a bit, to give an other way of computing the p-values. Theresults are the same as the results of R's summary.lm() in package statsand the code is easy to follow.



linmodEst <- function(x, y){
    ## compute QR-decomposition of x
    qx <- qr(x)
    ## compute (x'x)^(-1) x'y
    coef <- solve.qr(qx, y)
    ## degrees of freedom and standard deviation of residuals
    df <- nrow(x) - ncol(x)
    sigma2 <- sum((y - x %*% coef)^2)/df
    ## compute sigma^2 * (x'x)^-1
    vcov <- sigma2 * chol2inv(qx$qr)
    colnames(vcov) <- rownames(vcov) <- colnames(x)
    list(coefficients = coef,
        vcov = vcov,
        sigma = sqrt(sigma2),
        df = df)
}

summary.linmod <- function(object, ...){
    se <- sqrt(diag(object$vcov))
    tval <- coef(object) / se
    TAB <- cbind(Estimate = coef(object),
            StdErr = se,
            t.value = tval,
            p.value = 2*pt(-abs(tval), df=object$df),
            p.value2 = 2*pt(abs(tval), df=object$df, lower.tail = FALSE))
    res <- list(call=object$call,
            coefficients=TAB)
    #class(res) <- "summary.linmod"
    res
}


mod <- linmodEst(cbind(Const = 1, x_var), y_var)
summary.linmod(mod)


Hope this helps,

Rui Barradas


Em 03-12-2012 14:26, Robert Baer escreveu:

On 12/3/2012 6:20 AM, Sindri wrote:
Dear R-users
Please excuse me if this topic has been covered before, but I wasunable to
find anything relevant by searching
I am currently doing a comparison of two biological variables thathave ahighly significant linear relationship. I know that the p-value oflinearregression is not so interesting in itself, but this particular valuedoes
raise a question.

How does R calculate (extremely low) p-values for linear regression?
For my data I got a p-value on the order of 10^-9 and a reviewercommentedon this. I tried to run the same analysis in both SAS and Sigmastatto be
sure that I was doing it right, but both these programs only return a
p-value of p < 0.0001
Since I am unable to reproduce my results in another statisticsprogram, it
would be nice to be able to explain this unusally low p-value to the
reviewers.
This is a matter of you understanding that the p-value is an areaunder a probability density curve. R is simply printing out theactual area in a tail of some distribution. The other statisticalprogram is making the assumption that you are using the p-value tocompare to a cutoff alpha value that is (in most fields) never setmuch below p<0.001. If p < alpha the "hypothesis test crowd" , wouldchoose to reject NULL hypothesis, so the other statistics programstake the attitude -- "why provide more detail?". R chooses to giveyou the actual number and let you do what you will with it. You couldprobably benefit from reviewing hypothesis testing in a basicstatistics book if this is not clear.
Note that 10e-9 is indeed less than 0.0001, so the programs don'tdisagree. R just provides more detail.
This "problem" can be illustrated with the following made-up data:
x_var<-c(0.149,0.178,0.3474,0.167,0.121,0.182,0.176,0.448,0.091,0.083,0.090,0.407,0.378,0.132,0.227,0.172,0.088,0.392,0.425,0.150,0.319,0.190,0.171,0.290,0.214,0.431,0.193)
y_var<-c(0.918,0.394,0.131,0.9084,0.916,0.934,0.928,0.279,0.830,0.927,0.964,0.323,0.097,0.914,0.614,0.790,0.984,0.530,0.207,0.858,0.408,0.919,0.869,0.347,0.834,0.276,0.940)
fit<-lm(y_var~x_var)
summary(fit)
Call:
lm(formula = y_var ~ x_var)

Residuals:
      Min       1Q   Median       3Q      Max
-0.39152 -0.06027  0.00933  0.10024  0.22711

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.18696    0.06394  18.562 3.90e-16 ***
x_var       -2.25529    0.24788  -9.098 2.08e-09 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1503 on 25 degrees of freedom
Multiple R-squared: 0.768,    Adjusted R-squared: 0.7588
F-statistic: 82.78 on 1 and 25 DF,  p-value: 2.083e-09


With kind regards,
Sindri Traustason



-----
-----------------------------------------
Sindri Traustason
Glostrup Hospital Ophthalmology Research Dept.
Copenhagen, Demark

--
View this message in context:http://r.789695.n4.nabble.com/Calculation-of-extremely-low-p-values-in-lm-tp4651823.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Calculation of extremely low p-values (in lm)

Reply via email to