On 2013-05-17 12:45, Jesse Gervais wrote:
Hi there,
I want to do several bivariate linear regressions and, than, do a
multivariate linear regression including only variables significantly
associated *(p < 0.15)* with y in bivariate analysis, without having to
look manually to those p values.
So, here what I got for the moment.
First, I use this data set:
tolerance <- read.csv("
http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1.txt").
Second, I used this command, allowing me to extract p-values later:
lmp <- function (modelobject) {
if (class(modelobject) != "lm") stop("Not an object of class
'lm' ")
f <- summary(modelobject)$fstatistic
p <- pf(f[1],f[2],f[3],lower.tail=F)
attributes(p) <- NULL
return(p)}
Third, I did my bivariate linear regressions:
fit = lm(exposure~tol11, data = tolerance)
fit_2 = lm(exposure~tol12, data= tolerance)
fit_3 = lm(exposure~tol13, data= tolerance)
fit_4 = lm(exposure~tol14, data= tolerance)
fit_5 = lm(exposure~tol15, data= tolerance)
Fourth, I extracted p-values:
lmp(fit)
lmp(fit_2)
lmp(fit_3)
lmp(fit_4)
lmp(fit_5)
Firth, I confirmed that p-values were OK (just to be sure, it's the first
time I used the above procedure) :
summary (fit)
summary (fit_2)
summary (fit_3)
summary (fit_4)
summary (fit_5)
And now, I’m, I don’t know what to do.
The multivariate linear regression (if all variables were included) is:
fit_multi = lm (exposure ~ tol11 + tol12 + tol13 + tol14 + tol15, data=
tolerance)
I would like to be able to do something like:
fit_multi = lm (exposure ~ tol11 [include only if lmp( fit) < 0.15] +
tol12 [include only if lmp(fit_2) < 0.15] + tol13 [include only if
lmp(fit_3) < 0.15] + tol14 [include only if lmp(fit_4) < 0.15] +
tol15 [include
only if lmp(fit_4) < 0.15], data= tolerance)
Any idea?
(Thanks for providing reproducible code!)
It seems to me that you're just missing two things:
1. a way to determine the names of the variables to be included
in the multiple (not 'multivariate' to be nitpicky) regression;
2. a way to build the formula for the multiple regression once
you know which predictors to include.
To get the variables:
varnames <- names(tolerance)[2:6]
pvec <- c(lmp(fit), lmp(fit_2), lmp(fit_3), lmp(fit_4), lmp(fit_5))
use <- varnames[pvec < 0.15]
use
#[1] "tol14" "tol15"
To construct the formula:
rhs <- paste(use, collapse = " + ")
form <- paste("exposure ~", rhs)
And then use it:
fit_multi <- lm(formula = form, data = tolerance)
Peter Ehlers
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.