[email protected] wrote: > Good points. > > I don't really have a note, just a proceedings paper on conditions for > nearest neighbour imputation to lead to unbiased estimation, but not pursued > further; 1999 actually, not 1998. Predictive mean matching is a form of > nearest neighbour imputation. Predicting the mean is a rescaling and matching > applies the distance function. >
Regarding a note I just meant our e-mail exchange. For the moment I'm wondering about the bias in standard errors, more so than bias in the regression coefficients. > I think PMM can be viewed as a regression with an added residual because the > imputed value in not directly "on the regression line", that is, not the > expected value, but the expected value plus something (only an exact match > would lead to no residual: unlikely and even more with high nonresponse > rates). Also, I guess I was implicitely assuming that the variance is > non-constant, and with an impact significant enough to be seen in high > nonresponse cells. Maybe the variance can be assumed constant, but at least a > residual should be assumed to be present for variance estimation. I think I see your point, but Rubin's formula for variance just looks at variation of the regression coefficients across multiple imputations, plus the usual variance, so I can't see where there is an opportunity to correct for imperfect matching. > > With nearest neighbour imputation, Burns (1990) had seen problems with using > the first two neighbours for variance estimation and we had reasonable > results assuming it is a regression with added residuals in Lee, Rancourt and > S?rndal (1994). Maybe these apply only for NN and not for PMM, but I thought > I might provide a lead... If you are going to add residuals, I think you might as well stick with regression imputation and omit the PMM step. > > I have no problem with the posting of these exchanges. Thanks Eric, Frank > > Eric > > -----Message d'origine----- > De : Frank E Harrell Jr [mailto:[email protected]] > Envoy? : 16 janvier 2007 15:09 > ? : [email protected] > Objet : Re: RE : [Impute] SEs of regression coefficients after predictive > meanmatching > > > [email protected] wrote: >> For any variation of donor method, there has to be compactedness (this >> is usually reached with a not-to-frequent nonresponse) for the method >> to lead to asymptotic unbiasedness. Often I have found that pretending >> we are in presence of regression does a good job when data are >> compact. Otherwise, the implicitely-imputed added residuals have to be >> boosted for variance calculation purposes. It is like unit i does not >> receive the right residual, but rather residual e sub j from unit j >> the donor (which is at a different point on the "x line" and therefore >> has a sligntly different conditional distribution). So the residual >> has to be increased by the difference between the expected values of y >> at points i and j. I don't recall much on this in the literature, but >> I have an embryo of it in the 1998 ASA SRMS proceedings. >> >> Eric Rancourt >> Statistics Canada > > Thanks very much for your note. When you are ready would you mind > posting your note and this response to the list? > > There are two things unclear about your note. First, PMM does not use > residuals in any way but PMM does need to inherit uncertainty in the > regression equation used for predicting the target variable. Second, > the conditional distribution of the target might be assumed to have > constant variance, so the residuals should be exchangeable. For > non-large sample sizes the residuals actually have some correlation and > non-equal variance but I think these can be ignored. So I'm not clear > on why you would need to talk about position on the x line. > > Thanks for the discussion and ideas, > Frank > >> -----Message d'origine----- >> De : [email protected] >> [mailto:[email protected]] De la part de Frank E >> Harrell Jr Envoy? : 16 janvier 2007 12:41 ? : >> [email protected] Objet : [Impute] SEs of regression >> coefficients after predictive meanmatching >> >> >> In one set of simulation experiments I am finding that the Rubin >> variance-covariance formula works very well for regression imputation >> but that the standard error of the final regression coefficient for a >> frequently missing target variable is very much underestimated if PMM is >> used. Does anyone have experience with this or know of a pertinent >> reference? In doing PMM I have used both the closest match as part of >> the random-draw multiple imputation algorithm, and I have also tried >> weighted sampling where the closest match has the highest probability of >> being selected but donors around the closest may be selected with >> decreasing probability as they are farther away from the closest match. >> Missingness of the target variable is moderately strongly related to >> observed values of another covariate (that has no missings). >> >> Thanks > > -- Frank E Harrell Jr Professor and Chair School of Medicine Department of Biostatistics Vanderbilt University
