David Judkins wrote:
> Frank,
> 
> It depends on the fineness of grain in the predictions generated by the 
> model, but in the extreme case where there is a single nearest match for each 
> missing case, then drawing that nearest case five times will result in five 
> identical imputations, leading, of course, to zero between-imputation 
> variance for any marginal statistic of the variable.  I am not certain who 
> was first to make this point, but you can find it among other places on page 
> 500 of J.N.K. Rao's article in the June 1996 JASA trio of competing paradigms 
> by Rubin, Fay, and Rao.  Rao references S?rndal 1992.
> 
> A workable approach and references to other workable approaches for 
> predictive mean matching (aka nearest neighbor imputation) are given in Kim 
> 2002:
> 
> Kim, Jae Kwang (2002) 
> Variance estimation for nearest neighbor imputation with application to 
> census long form data 
> ASA Proceedings of the Joint Statistical Meetings, 1857-1862 
> American Statistical Association (Alexandria, VA) 
> Keywords: Fractional imputation; Jackknife; Section on Survey Research 
> Methods; JSM 
> 
> 
> --David Judkins

Great point David - thanks.  For that reason, my R/S-Plus aregImpute 
function does weighted sampling using Tukey's tricube function with a 
sharp peak at the closest match in predicted value.  I'm getting much 
better distributions of imputed values when I do that.

Frank

> 
> -----Original Message-----
> From: Frank E Harrell Jr [mailto:[email protected]] 
> Sent: Wednesday, March 29, 2006 3:42 PM
> To: David Judkins
> Cc: Paul T. Shattuck; [email protected]
> Subject: Re: [Impute] range of imputed values for income
> 
> David Judkins wrote:
> 
>>I am not aware of the capabilities of IVEware, but the general question
>>of person-level mean squared prediction error is a function of both the
>>covariates and the imputation procedure.  As Dr. Rubin has pointed out,
>>minimizing person-level MSPE is not typically a primary goal in the
>>analysis of surveys and experiments although it might be important an
>>activity like fraud detection.  Nonetheless, reduced person-level MSPE
>>should also translate into both lower variances on estimated population
>>and superpopulation marginal parameters and reduced bias on regression
>>coefficients.  So you want to use as rich a set of covariates in the
>>imputation as are available to you and to use the model-based
>>predictions in your imputation to at least some extent.  Unfortunately,
>>the stronger the usage you make, the more difficult it becomes to
>>estimate the post-imputation variance.  For example, a predictive-mean
>>matching approach to imputation defeats multiple imputation as a
>>variance-estimation technique.  For normally distributed outcomes,
> 
> 
> David - It's not clear to me why PMM would invalidate the using Rubin 
> variance estimator for regression coefficient variances.  But maybe you 
> are saying that PMM doesn't work if you are primarily interested in 
> estimating a variance parameter (what kind?).  -Frank Harrell
> 
> 
>>really good methods that both utilize covariate information and allow
>>post-imputation variance estimation are pretty much Bayesian and involve
>>Gibbs sampling to fit complex models and make reasonable posterior
>>draws.  (See Schafer's book.) Even they do not cope well with the
>>natural heaping in income where people round to the nearest thousand
>>dollars or even worse. I have some papers on how to impute non-normal
>>outcomes using covariates that are subject to missing values themselves,
>>but I have not yet been able to develop and validate good
>>post-imputation variance estimators to go with them.  
>>
>>Your person-level MSPE seems so large that I suspect your software is
>>not using any covariates.  While that makes post-imputation variance
>>estimation easy, it seems like you could do better.  
>>
>>The preservation of the marginal first and second order moments of
>>income seem to support the idea that you are not using any covariates.
>>The robustness of the model coefficients is harder to reconcile.  I
>>think this can only happen with a simple imputation procedure if the
>>missing data rate is negligible or if the model isn't very good to begin
>>with.  If substantial numbers of subjects were being thrown back and
>>forth between $3,000 and $100,000 per year, the coefficients in good
>>models would certainly be attenuated. Maybe you just don't have any
>>variables that are strongly related to income?
>>
>>David Judkins 
>>Senior Statistician 
>>Westat 
>>1650 Research Boulevard 
>>Rockville, MD 20850 
>>(301) 315-5970 
>>[email protected] 
>>
>>
>>-----Original Message-----
>>From: [email protected]
>>[mailto:[email protected]] On Behalf Of Paul T.
>>Shattuck
>>Sent: Wednesday, March 29, 2006 11:43 AM
>>To: [email protected]
>>Subject: [Impute] range of imputed values for income
>>
>>Hello,
>>
>>I am using IVEware for multiple imputation for the first time on a large
>>
>>national health survey.  One of the variables imputed is income and I'm 
>>finding that imputed values can vary dramatically within-subjects across
>>
>>multiply imputed datasets.  For instance, in some cases Person A might 
>>have an imputed income of $3,000 in one imputation, and then $$100,000 
>>in another imputation.  This within-person variability far exceeds what 
>>I'm seeing with other variables in the survey.  The distributions, 
>>means, and standard deviations of the imputed vs. non-imputed values are
>>
>>comparable.  And multivariate regression results using the multiply 
>>imputed datasets and the original dataset with missing values are 
>>reasonably robust, with the same substantive conclusions and very close 
>>coefficient estimates.  So, I'm wondering if this degree of 
>>within-subject variability across imputations is something to worry 
>>about, and potentially an indicator of a mis-specified imputation 
>>model....or whether this kind of within-subject variability across 
>>imputed datasets is typical.
>>
>>Thanks,
>>
>>Paul Shattuck
>>
> 
> 
> 


-- 
Frank E Harrell Jr   Professor and Chair           School of Medicine
                      Department of Biostatistics   Vanderbilt University

Reply via email to