In addition to the tips above, you may want to chek out: http://www.stat.columbia.edu/~gelman/arm/missing.pdf
2010/6/30 Chuck Cleland <cclel...@optonline.net> > On 6/30/2010 1:14 AM, Daniel Chen wrote: > > Hi, > > > > I am a long time SPSS user but new to R, so please bear with me if my > > questions seem to be too basic for you guys. > > > > I am trying to figure out how to analyze survey data using logistic > > regression with multiple imputation. > > > > I have a survey data of about 200,000 cases and I am trying to predict > the > > odds ratio of a dependent variable using 6 categorical independent > variables > > (dummy-coded). Approximatively 10% of the cases (~20,000) have missing > data > > in one or more of the independent variables. The percentage of missing > > ranges from 0.01% to 10% for the independent variables. > > > > My current thinking is to conduct a logistic regression with multiple > > imputation, but I don't know how to do it in R. I searched the web but > > couldn't find instructions or examples on how to do this. Since SPSS is > > hopeless with missing data, I have to learn to do this in R. I am new to > R, > > so I would really appreciate if someone can show me some examples or tell > me > > where to find resources. > > Here is an example using the Amelia package to generate imputations > and the mitools and mix packages to make the pooled inferences. > > titanic <- > read.table("http://lib.stat.cmu.edu/S/Harrell/data/ascii/titanic.txt", > sep=',', header=TRUE) > > set.seed(4321) > > titanic$sex[sample(nrow(titanic), 10)] <- NA > titanic$pclass[sample(nrow(titanic), 10)] <- NA > titanic$survived[sample(nrow(titanic), 10)] <- NA > > library(Amelia) # generate multiple imputations > library(mitools) # for MIextract() > library(mix) # for mi.inference() > > titanic.amelia <- amelia(subset(titanic, > select=c('survived','pclass','sex','age')), > m=10, noms=c('survived','pclass','sex'), > emburn=c(500,500)) > > allimplogreg <- lapply(titanic.amelia$imputations, > function(x){glm(survived ~ pclass + sex + age, family=binomial, data = x)}) > > mice.betas.glm <- MIextract(allimplogreg, fun=function(x){coef(x)}) > mice.se.glm <- MIextract(allimplogreg, > fun=function(x){sqrt(diag(vcov(x)))}) > > as.data.frame(mi.inference(mice.betas.glm, mice.se.glm)) > > # Or using only mitools for pooled inference > > betas <- MIextract(allimplogreg, fun=coef) > vars <- MIextract(allimplogreg, fun=vcov) > summary(MIcombine(betas,vars)) > > > Thank you! > > > > Daniel > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- > Chuck Cleland, Ph.D. > NDRI, Inc. (www.ndri.org) > 71 West 23rd Street, 8th floor > New York, NY 10010 > tel: (212) 845-4495 (Tu, Th) > tel: (732) 512-0171 (M, W, F) > fax: (917) 438-0894 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.