Thanks much to Dave, Juned, Stef, and Arthur for your sage advice!! I am
looking forward to giving these strategies a go.
Jon

On Fri, Dec 7, 2012 at 11:08 AM, Arthur Kennickell <
[email protected]> wrote:

> I have done something similar to what Stef describes, for the imputation of
> panel data for the Survey of Consumer Finances.  In our case, it is
> important to be able to specify each column separately, both because of the
> sparseness issue implicit in Stef's point and because there are prior
> constraints on outcomes that would be very difficult to specify otherwise.
> Best wishes,
> Arthur
>
> Arthur B. Kennickell
> Assistant Director, Division of Research and Statistics
> Mail Stop 153
> Board of Governors of the Federal Reserve System
> Washington, DC  20551
> v: 202-452-2247
> f: 202-728-5838
> e: [email protected]
> SCF website: http://www.federalreserve.gov/pubs/oss/oss2/scfindex.html
>
> Please consider the environment before printing this e-mail.
>
>
>
>
>
> From:   "Buuren, S. (Stef) van" <[email protected]>
> To:     <[email protected]>
> Date:   12/07/2012 08:35 AM
> Subject:        Re: Choosing an imputation model
> Sent by:        Impute -- Imputations in Data Analysis
>             <[email protected]>
>
>
>
> Jonathan,
> This is a problem that occurs in many social science and medical
> applications. My approach is to build a separate imputation model for each
> incomplete column, which requires far fewer predictors per sub model (say
> 10-15). You can find an example using mice and R in Section 9.1 of the book
> Flexible Imputation of Missing Data.
> Best wishes,
> Stef
>
> From: Impute -- Imputations in Data Analysis [
> mailto:[email protected]] On Behalf Of Jonathan Mohr
>
>
> Sent: Wednesday, December 05, 2012 4:45 PM
> To: [email protected]
> Subject: Choosing an imputation model
>
>
>
>
>
> Hi folks,
> I'm writing with a question about how to develop a imputation model when
> (a) there are many potential variables to include and (b) the number of
> imputations required for the MCMC chain to stabilize is very high (~3000)
> when a large number of variables are included in the imputation model. I'll
> do my best to describe our situation briefly:
>
> THE STUDY
> Data from 48 people were collected at six time points, and include over
> 2,000 variables. Each of the research questions requires running a multiple
> regression in which 2-3 variables assessed at earlier time points predict a
> variable assessed at the last time point. All data are available for the
> outcome variable, but there are missing data for all of the predictors
> (ranging from 5% to 31% missing).
>
> DEVELOPING THE IMPUTATION MODEL
> We have tried two basic approaches to developing the imputation model. One
> is simply to include in the imputation model all of the variables that will
> appear in any of the analyses. This imputation model consists of around 35
> variables. The other approach was to select a much larger pool of potential
> variables to consider for inclusion in the imputation model. We identified
> all variables that we believed would be associated with our main variables
> of interest. We then conducted a series of stepwise regressions as a
> shortcut to attempt to identify a smaller set of variables that uniquely
> predicted each of the main variables for which data were missing. This
> smaller set contained 18 variables, which--when added to the main
> variables--led to an imputation model of 53 variables.
>
> QUESTION
> When we generate imputed data sets with the smaller imputation model, the
> chain stabilizes relatively quickly (a little over 100 iterations are
> needed). In contrast, over 3000 iterations are needed with the larger
> imputation model. Should we use the smaller imputation model, even if it
> doesn't include variables that we know are uniquely predictive of variables
> for which there are missing data?
>
> Thanks in advance for your thoughts!!
> Jon
>
> --
> ***Please note change of email to [email protected]***
>
> Jonathan Mohr
> Assistant Professor
> Department of Psychology
> Biology-Psychology Building
> University of Maryland
> College Park, MD 20742-4411
>
> Office phone: 301-405-5907
> Fax: 301-314-5966
> Email: [email protected]
>
>
> This e-mail and its contents are subject to the DISCLAIMER at
> http://www.tno.nl/emaildisclaimer
>



-- 
***Please note change of email to [email protected]***

Jonathan Mohr
Assistant Professor
Department of Psychology
Biology-Psychology Building
University of Maryland
College Park, MD 20742-4411

Office phone: 301-405-5907
Fax: 301-314-5966
Email: [email protected]

Reply via email to