On 2013-05-17 12:45, Jesse Gervais wrote:
Hi there,



I want to do several bivariate linear regressions and, than, do a
multivariate linear regression including only variables significantly
associated *(p < 0.15)* with y in bivariate analysis, without having to
look manually to those p values.



So, here what I got for the moment.



First, I use this data set:



tolerance <- read.csv("
http://www.ats.ucla.edu/stat/r/examples/alda/data/tolerance1.txt";).



Second, I used this command, allowing me to extract p-values later:



lmp <- function (modelobject) {

             if (class(modelobject) != "lm") stop("Not an object of class
'lm' ")

             f <- summary(modelobject)$fstatistic

             p <- pf(f[1],f[2],f[3],lower.tail=F)

             attributes(p) <- NULL

             return(p)}



Third, I did my bivariate linear regressions:



fit   = lm(exposure~tol11, data = tolerance)

fit_2 = lm(exposure~tol12, data= tolerance)

fit_3 = lm(exposure~tol13, data= tolerance)

fit_4 = lm(exposure~tol14, data= tolerance)

fit_5 = lm(exposure~tol15, data= tolerance)



Fourth, I extracted p-values:



lmp(fit)

lmp(fit_2)

lmp(fit_3)

lmp(fit_4)

lmp(fit_5)



Firth, I confirmed that p-values were OK (just to be sure, it's the first
time I used the above procedure) :



summary (fit)

summary (fit_2)

summary (fit_3)

summary (fit_4)

summary (fit_5)



And now, I’m, I don’t know what to do.



The multivariate linear regression (if all variables were included) is:



fit_multi = lm (exposure ~ tol11 + tol12 + tol13 + tol14 + tol15, data=
tolerance)



I would like to be able to do something like:


fit_multi = lm (exposure ~ tol11 [include only if  lmp( fit) < 0.15] +
tol12 [include only if  lmp(fit_2) < 0.15]  + tol13 [include only if
lmp(fit_3) < 0.15] + tol14 [include only if lmp(fit_4) < 0.15]  +
tol15 [include
only if lmp(fit_4) < 0.15], data= tolerance)



Any idea?


(Thanks for providing reproducible code!)

It seems to me that you're just missing two things:

1. a way to determine the names of the variables to be included
   in the multiple (not 'multivariate' to be nitpicky) regression;

2. a way to build the formula for the multiple regression once
   you know which predictors to include.

To get the variables:

  varnames <- names(tolerance)[2:6]
  pvec <- c(lmp(fit), lmp(fit_2), lmp(fit_3), lmp(fit_4), lmp(fit_5))
  use <- varnames[pvec < 0.15]
  use
  #[1] "tol14" "tol15"

To construct the formula:

  rhs <- paste(use, collapse = " + ")
  form <- paste("exposure ~", rhs)

And then use it:

  fit_multi <- lm(formula = form, data = tolerance)

Peter Ehlers

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to