David Judkins wrote: > I am not aware of the capabilities of IVEware, but the general question > of person-level mean squared prediction error is a function of both the > covariates and the imputation procedure. As Dr. Rubin has pointed out, > minimizing person-level MSPE is not typically a primary goal in the > analysis of surveys and experiments although it might be important an > activity like fraud detection. Nonetheless, reduced person-level MSPE > should also translate into both lower variances on estimated population > and superpopulation marginal parameters and reduced bias on regression > coefficients. So you want to use as rich a set of covariates in the > imputation as are available to you and to use the model-based > predictions in your imputation to at least some extent. Unfortunately, > the stronger the usage you make, the more difficult it becomes to > estimate the post-imputation variance. For example, a predictive-mean > matching approach to imputation defeats multiple imputation as a > variance-estimation technique. For normally distributed outcomes,
David - It's not clear to me why PMM would invalidate the using Rubin variance estimator for regression coefficient variances. But maybe you are saying that PMM doesn't work if you are primarily interested in estimating a variance parameter (what kind?). -Frank Harrell > really good methods that both utilize covariate information and allow > post-imputation variance estimation are pretty much Bayesian and involve > Gibbs sampling to fit complex models and make reasonable posterior > draws. (See Schafer's book.) Even they do not cope well with the > natural heaping in income where people round to the nearest thousand > dollars or even worse. I have some papers on how to impute non-normal > outcomes using covariates that are subject to missing values themselves, > but I have not yet been able to develop and validate good > post-imputation variance estimators to go with them. > > Your person-level MSPE seems so large that I suspect your software is > not using any covariates. While that makes post-imputation variance > estimation easy, it seems like you could do better. > > The preservation of the marginal first and second order moments of > income seem to support the idea that you are not using any covariates. > The robustness of the model coefficients is harder to reconcile. I > think this can only happen with a simple imputation procedure if the > missing data rate is negligible or if the model isn't very good to begin > with. If substantial numbers of subjects were being thrown back and > forth between $3,000 and $100,000 per year, the coefficients in good > models would certainly be attenuated. Maybe you just don't have any > variables that are strongly related to income? > > David Judkins > Senior Statistician > Westat > 1650 Research Boulevard > Rockville, MD 20850 > (301) 315-5970 > [email protected] > > > -----Original Message----- > From: [email protected] > [mailto:[email protected]] On Behalf Of Paul T. > Shattuck > Sent: Wednesday, March 29, 2006 11:43 AM > To: [email protected] > Subject: [Impute] range of imputed values for income > > Hello, > > I am using IVEware for multiple imputation for the first time on a large > > national health survey. One of the variables imputed is income and I'm > finding that imputed values can vary dramatically within-subjects across > > multiply imputed datasets. For instance, in some cases Person A might > have an imputed income of $3,000 in one imputation, and then $$100,000 > in another imputation. This within-person variability far exceeds what > I'm seeing with other variables in the survey. The distributions, > means, and standard deviations of the imputed vs. non-imputed values are > > comparable. And multivariate regression results using the multiply > imputed datasets and the original dataset with missing values are > reasonably robust, with the same substantive conclusions and very close > coefficient estimates. So, I'm wondering if this degree of > within-subject variability across imputations is something to worry > about, and potentially an indicator of a mis-specified imputation > model....or whether this kind of within-subject variability across > imputed datasets is typical. > > Thanks, > > Paul Shattuck > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
