Some further thoughts: 1. The arguments I've seen for using around five imputations are based on efficiency calculations for the parameter estimates. But what about standard errors and p-values? I've found them to be rather unstable for moderate to large fractions of missing information.
2. Joe Schafer told me several months ago that he had a dissertation student whose work showed that substantially larger numbers of imputations were often required for good inference. But I don't know any of the details. 3. For these reasons, I've adopted the following rule of thumb: Do a sufficient number of imputations to get the estimated DF over 100 for all parameters of interest. I'd love to know what others think of this. ---------------------------------------------------------------- Paul D. Allison, Professor & Chair Department of Sociology University of Pennsylvania 3718 Locust Walk Philadelphia, PA 19104-6299 voice: 215-898-6717 or 215-898-6712 fax: 215-573-2081 [EMAIL PROTECTED] http://www.ssc.upenn.edu/~allison I'm baffled too on both counts. Modest numbers of imputations work fine unless the fractions of missing information are very high (> 50%), and then I wouldn't think of those situations as missing data problems except in a formal sense. And the number of them is a random variable??? I guess we'll have to read what they wrote... On Thu, 19 Feb 2004, Howells, William wrote: > I came across a note from Hershberger and Fisher on the number of > imputations (citation below), where they conclude that a much larger > number of imputations is required (over 500 in some cases) than the > usual rule of thumb that a relatively small number of imputations is > needed (say 5 to 20 per Rubin 1987, Schafer 1997). They argue that > the traditional rules of thumb are based on simulations rather than > sampling theory. Their calculations assume that the number of > imputations is a random variable from a uniform distribution and use a > formula from Levy and Lemeshow (1999) n >= (z**2)(V**2)/e**2, where n > is the number of imputations, z is a standard normal variable, V**2 is > the squared coefficient of variation (~1.33) and e is the "amount of > error, or the degree to which the predicted number of imputations > differs from the optimal or "true" number of imputations". For > example, with z=1.96 and e=.10, n=511 imputations are required. > > > > I'm having difficulty conceiving of the number of imputations as a > random variable. What does "true" number of imputations mean? Is > this argument legitimate? Should I be using 500 imputations instead of 5? > > > > Bill Howells, MS > > Behavioral Medicine Center > > Washington University School of Medicine > > St Louis, MO > > > > Hershberger SL, Fisher DG (2003), Note on determining the number of > imputations for missing data, Structural Equation Modeling, 10(4): > 648-650. > > > > http://www.leaonline.com/loi/sem > > > > -- Donald B. Rubin John L. Loeb Professor of Statistics Chairman Department of Statistics Harvard University Cambridge MA 02138 Tel: 617-495-5498 Fax: 617-496-8057