Hi: On Fri, Jan 7, 2011 at 7:04 AM, wangwallace <talentt...@gmail.com> wrote:
> > hey, folks, > > I have two very simple questions. I am not quite sure if I can do this > using > R. > > so, I am analyzing a large data frame with thousands of variables. For > example: > > Dependent variables: d1, d2, d3 (i.e., there are three dependent variables) > > Independent variables: s1, s2, s3, ......s1000 (i.e., there are 1000 > independent variables) > > now I want to run simple linear regression analyses of dependent variables > on independent variables. A laborious way to do this is running 1000 linear > regression models for each dependent variable separately. This would take a > lot of time. My question is: > > 1) is there a easy way to run all 1000 linear regression analyses for each > dependent variable at once? > Yes. > 2) after running all 1000 linear regression analyses at once, how can I > write 3000 regression weights (1000 regression weights for each dependent > variable) and its significance level in to a file (e.g., csv, excel, ect). > Define 'weights'. Do you mean the parameter estimates? Here's a simplified example, but the output is a separate list object for each response. Each component of each list is an lm object, produced from the lm() function. # Generate some fake data: three responses and eight covariates df <- data.frame(y1 = rnorm(50), y2 = rnorm(50), y3 = rnorm(50), x1 = rnorm(50), x2 = rnorm(50), x3 = rnorm(50), x4 = rpois(50, 30), x5 = rpois(50, 20), x6 = rpois(50, 10), x7 = runif(50), x8 = runif(50)) # Create a vector of covariate names xs <- paste('x', 1:8, sep = '') # Initialize a list whose length is that of the vector xs rl1 <- vector('list', 8) rl2 <- vector('list', 8) rl3 <- vector('list', 8) # The following loop regresses all three responses individually on a single covariate x[i] # and exports the results to separate lists for each response # The first statement creates a formula with the name of the i-th covariate # The second statement does the regression and assigns the output to the i-th # component of the list corresponding to the j-th response (j = 1, 2, 3) for(i in 1:8) { fm1<- as.formula(paste('y1', xs[i], sep = '~')) fm2 <- as.formula(paste('y2', xs[i], sep = '~')) fm3 <- as.formula(paste('y3', xs[i], sep = '~')) rl1[[i]] <- lm(fm1, data = df) rl2[[i]] <- lm(fm2, data = df) rl3[[i]] <- lm(fm3, data = df) } # The print method of lm() applied to the first component is rl1[[1]] Each component of each list will resemble the output object you would get from running lm() on a single response with one explanatory variable at a time. In each list, there are as many components as there are covariates. You can extract all sorts of things from each component of the list; I prefer using the ldply() function from package plyr, but there are other ways with base functions. Here are some examples: library(plyr) # R^2 values: ldply(rl1, function(x) summary(x)$r.squared) # Model coefficients: ldply(rl1, function(x) coef(x)) # p-values of significance tests for intercept and slope ldply(rl1, function(x) summary(x)$coefficients[, 4]) # residuals from each model res1 <- t(ldply(rl1, function(x) resid(x))) # produces a matrix Some comments: 1. If you want to run multivariate regression on the response matrix [y1 y2 y3], you only need one list object for output. Substitute 'cbind(y1, y2, y3)' for yi when creating the formula. Only one call is needed for multivariate regression rather than three for the univariate regressions. However, you would then need to be more careful about output structures and pulling them together over list components. For instance, in a multivariate regression, the coefficients are output as a matrix rather than a vector, so to combine them all, you would need to use a 3D array rather than a data frame. 2. The 'x' in the anonymous functions in ldply() corresponds to a generic list component. In this context, x is an lm object, so anything you could do with the output of an lm object could be mapped componentwise with ldply. This approach is very convenient if you want to pick off individual output pieces (e.g., R^2 or the root MSE) for all the regressions at once. 3. To associate the covariates with rows of output in each generated data frame above, you could use, e.g., r2y1 <- ldply(rl1, function(x) summary(x)$r.squared) rownames(r2y1) <- xs In the residual example, the data frame is 8 x 50; using the t() function [for transpose], the result is a 50 x 8 matrix whose columns correspond to the individual covariates, so in this case colnames(res1) <- xs 4. Make sure you organize your code and desired output in advance; with 1000 covariates, things could get messy if you're not careful. HTH, Dennis > Many thanks in advance!! > -- > View this message in context: > http://r.789695.n4.nabble.com/how-to-run-linear-regression-models-at-once-tp3179256p3179256.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.