Re: Choosing an imputation model

Juned Siddique Thu, 06 Dec 2012 08:14:48 -0800

Hi Jonathon,

Have you considered using a ridge prior for the covariance matrix in your 
imputation model? This can be done using Proc MI in SAS.

-Juned

From: Impute -- Imputations in Data Analysis 
[mailto:[email protected]] On Behalf Of Jonathan Mohr
Sent: Wednesday, December 05, 2012 4:45 PM
To: [email protected]
Subject: Choosing an imputation model

Hi folks,
I'm writing with a question about how to develop a imputation model when (a) 
there are many potential variables to include and (b) the number of imputations 
required for the MCMC chain to stabilize is very high (~3000) when a large 
number of variables are included in the imputation model. I'll do my best to 
describe our situation briefly:

THE STUDY
Data from 48 people were collected at six time points, and include over 2,000 
variables. Each of the research questions requires running a multiple 
regression in which 2-3 variables assessed at earlier time points predict a 
variable assessed at the last time point. All data are available for the 
outcome variable, but there are missing data for all of the predictors (ranging 
from 5% to 31% missing).

DEVELOPING THE IMPUTATION MODEL
We have tried two basic approaches to developing the imputation model. One is 
simply to include in the imputation model all of the variables that will appear 
in any of the analyses. This imputation model consists of around 35 variables. 
The other approach was to select a much larger pool of potential variables to 
consider for inclusion in the imputation model. We identified all variables 
that we believed would be associated with our main variables of interest. We 
then conducted a series of stepwise regressions as a shortcut to attempt to 
identify a smaller set of variables that uniquely predicted each of the main 
variables for which data were missing. This smaller set contained 18 variables, 
which--when added to the main variables--led to an imputation model of 53 
variables.

QUESTION
When we generate imputed data sets with the smaller imputation model, the chain 
stabilizes relatively quickly (a little over 100 iterations are needed). In 
contrast, over 3000 iterations are needed with the larger imputation model. 
Should we use the smaller imputation model, even if it doesn't include 
variables that we know are uniquely predictive of variables for which there are 
missing data?

Thanks in advance for your thoughts!!
Jon

--
***Please note change of email to [email protected]<mailto:[email protected]>***

Jonathan Mohr
Assistant Professor
Department of Psychology
Biology-Psychology Building
University of Maryland
College Park, MD 20742-4411

Office phone: 301-405-5907
Fax: 301-314-5966
Email: [email protected]<mailto:[email protected]>

Re: Choosing an imputation model

Reply via email to