Some further thoughts:

1. The arguments I've seen for using around five imputations are based
on efficiency calculations for the parameter estimates.  But what about
standard errors and p-values?  I've found them to be rather unstable for
moderate to large fractions of missing information.  

2. Joe Schafer told me several months ago that he had a dissertation
student whose work showed that substantially larger numbers of
imputations were often required for good inference.  But I don't know
any of the details. 

3. For these reasons, I've adopted the following rule of thumb: Do a
sufficient number of imputations to get the estimated DF over 100 for
all parameters of interest.  I'd love to know what others think of this.


----------------------------------------------------------------
Paul D. Allison, Professor & Chair
Department of Sociology
University of Pennsylvania
3718 Locust Walk
Philadelphia, PA  19104-6299
voice: 215-898-6717 or 215-898-6712
fax: 215-573-2081
[EMAIL PROTECTED]
http://www.ssc.upenn.edu/~allison
 




I'm baffled too on both counts.  Modest numbers of imputations work fine
unless the fractions of missing information are very high (> 50%), and
then I wouldn't think of those situations as missing data problems
except in a formal sense.  And the number of them is a random
variable???  I 
guess we'll have to read what they wrote...



On Thu, 19 Feb 2004, Howells, William wrote:

> I came across a note from Hershberger and Fisher on the number of 
> imputations (citation below), where they conclude that a much larger 
> number of imputations is required (over 500 in some cases) than the 
> usual rule of thumb that a relatively small number of imputations is 
> needed (say 5 to 20 per Rubin 1987, Schafer 1997).  They argue that 
> the traditional rules of thumb are based on simulations rather than 
> sampling theory.  Their calculations assume that the number of 
> imputations is a random variable from a uniform distribution and use a

> formula from Levy and Lemeshow (1999) n >= (z**2)(V**2)/e**2, where n 
> is the number of imputations, z is a standard normal variable, V**2 is

> the squared coefficient of variation (~1.33) and e is the "amount of 
> error, or the degree to which the predicted number of imputations 
> differs from the optimal or "true" number of imputations".  For 
> example, with z=1.96 and e=.10, n=511 imputations are required.
> 
>  
> 
> I'm having difficulty conceiving of the number of imputations as a 
> random variable.  What does "true" number of imputations mean?  Is 
> this argument legitimate?  Should I be using 500 imputations instead
of 5?
> 
>  
> 
> Bill Howells, MS
> 
> Behavioral Medicine Center
> 
> Washington University School of Medicine
> 
> St Louis, MO
> 
>  
> 
> Hershberger SL, Fisher DG (2003), Note on determining the number of 
> imputations for missing data, Structural Equation Modeling, 10(4): 
> 648-650.
> 
>  
> 
> http://www.leaonline.com/loi/sem
> 
>  
> 
> 

-- 
Donald B. Rubin
John L. Loeb Professor of Statistics
Chairman Department of Statistics
Harvard University
Cambridge MA 02138
Tel: 617-495-5498  Fax: 617-496-8057


Reply via email to