Hi Jonathon, Have you considered using a ridge prior for the covariance matrix in your imputation model? This can be done using Proc MI in SAS.
-Juned From: Impute -- Imputations in Data Analysis [mailto:[email protected]] On Behalf Of Jonathan Mohr Sent: Wednesday, December 05, 2012 4:45 PM To: [email protected] Subject: Choosing an imputation model Hi folks, I'm writing with a question about how to develop a imputation model when (a) there are many potential variables to include and (b) the number of imputations required for the MCMC chain to stabilize is very high (~3000) when a large number of variables are included in the imputation model. I'll do my best to describe our situation briefly: THE STUDY Data from 48 people were collected at six time points, and include over 2,000 variables. Each of the research questions requires running a multiple regression in which 2-3 variables assessed at earlier time points predict a variable assessed at the last time point. All data are available for the outcome variable, but there are missing data for all of the predictors (ranging from 5% to 31% missing). DEVELOPING THE IMPUTATION MODEL We have tried two basic approaches to developing the imputation model. One is simply to include in the imputation model all of the variables that will appear in any of the analyses. This imputation model consists of around 35 variables. The other approach was to select a much larger pool of potential variables to consider for inclusion in the imputation model. We identified all variables that we believed would be associated with our main variables of interest. We then conducted a series of stepwise regressions as a shortcut to attempt to identify a smaller set of variables that uniquely predicted each of the main variables for which data were missing. This smaller set contained 18 variables, which--when added to the main variables--led to an imputation model of 53 variables. QUESTION When we generate imputed data sets with the smaller imputation model, the chain stabilizes relatively quickly (a little over 100 iterations are needed). In contrast, over 3000 iterations are needed with the larger imputation model. Should we use the smaller imputation model, even if it doesn't include variables that we know are uniquely predictive of variables for which there are missing data? Thanks in advance for your thoughts!! Jon -- ***Please note change of email to [email protected]<mailto:[email protected]>*** Jonathan Mohr Assistant Professor Department of Psychology Biology-Psychology Building University of Maryland College Park, MD 20742-4411 Office phone: 301-405-5907 Fax: 301-314-5966 Email: [email protected]<mailto:[email protected]>
