On Thu, 19 Feb 2004, Paul Allison wrote: > Some further thoughts: > > 1. The arguments I've seen for using around five imputations are based > on efficiency calculations for the parameter estimates. But what about > standard errors and p-values? I've found them to be rather unstable for > moderate to large fractions of missing information.
Hi Paul, There were evaluations for frequentist p values and confidence coverage in chapter 4 of my 1987 book when the complete data sample size is effectively infinite. I think my view then was that these were remarkably good considering the alternatives (except using a larger # of imputes). But because these are the usual frequentist evaluations, they can get the "right answer" by averaging some fairly divergent results. > > 2. Joe Schafer told me several months ago that he had a dissertation > student whose work showed that substantially larger numbers of > imputations were often required for good inference. But I don't know > any of the details. > > 3. For these reasons, I've adopted the following rule of thumb: Do a > sufficient number of imputations to get the estimated DF over 100 for > all parameters of interest. I'd love to know what others think of this. > Interesting idea. In the work we did years ago for NCHS on NHANES, quite careful evaluations suggested that in that data base, for the kinds of analyses that they could contemplate realistically, 5 seemed to work fine. Presumably you use the newer Barnard and Rubin estimated DF rather than the older asymptotic one? Also, this is a way of making the number of imputes a random variable (as a function of the estimated fraction of missing info), which is a random variable. with the "true" number of imputes equal to the number needed to get true DF > 100. Interesting... Best, Don > > ---------------------------------------------------------------- > Paul D. Allison, Professor & Chair > Department of Sociology > University of Pennsylvania > 3718 Locust Walk > Philadelphia, PA 19104-6299 > voice: 215-898-6717 or 215-898-6712 > fax: 215-573-2081 > [email protected] > http://www.ssc.upenn.edu/~allison > > > > > > I'm baffled too on both counts. Modest numbers of imputations work fine > unless the fractions of missing information are very high (> 50%), and > then I wouldn't think of those situations as missing data problems > except in a formal sense. And the number of them is a random > variable??? I > guess we'll have to read what they wrote... > > > > On Thu, 19 Feb 2004, Howells, William wrote: > > > I came across a note from Hershberger and Fisher on the number of > > imputations (citation below), where they conclude that a much larger > > number of imputations is required (over 500 in some cases) than the > > usual rule of thumb that a relatively small number of imputations is > > needed (say 5 to 20 per Rubin 1987, Schafer 1997). They argue that > > the traditional rules of thumb are based on simulations rather than > > sampling theory. Their calculations assume that the number of > > imputations is a random variable from a uniform distribution and use a > > > formula from Levy and Lemeshow (1999) n >= (z**2)(V**2)/e**2, where n > > is the number of imputations, z is a standard normal variable, V**2 is > > > the squared coefficient of variation (~1.33) and e is the "amount of > > error, or the degree to which the predicted number of imputations > > differs from the optimal or "true" number of imputations". For > > example, with z=1.96 and e=.10, n=511 imputations are required. > > > > > > > > I'm having difficulty conceiving of the number of imputations as a > > random variable. What does "true" number of imputations mean? Is > > this argument legitimate? Should I be using 500 imputations instead > of 5? > > > > > > > > Bill Howells, MS > > > > Behavioral Medicine Center > > > > Washington University School of Medicine > > > > St Louis, MO > > > > > > > > Hershberger SL, Fisher DG (2003), Note on determining the number of > > imputations for missing data, Structural Equation Modeling, 10(4): > > 648-650. > > > > > > > > http://www.leaonline.com/loi/sem > > > > > > > > > > -- Donald B. Rubin John L. Loeb Professor of Statistics Chairman Department of Statistics Harvard University Cambridge MA 02138 Tel: 617-495-5498 Fax: 617-496-8057
