A few months ago, I posted a note asking how to estimate R^2 (and other quantities) when values are multiply imputed. A respondent suggested that I use the same strategy as that used to estimate the regression coefficients: get a point estimate from each imputed data set, and average these.

Today I began to wonder about this. Consider the regression Y=rX+e where X and Y are standard normal variables. Then R^2 = r^2. It was suggested that R^2 could be estimated by averaging the estimates of R^2=r^2 across multiple imputations. Yet r is estimated by averaging the estimates of r across multiple imputations. In general, these estimates will not agree: if r>0, then the estimate of R^2 will be less than the squared estimate of r. If the estimator of r is unbiased, then the proposed estimate of R^2 must be biased.

It strikes me there must be a lot of quantities for which we cannot obtain unbiased estimates using this procedure. Pertinent citations would be most appreciated.

Best wishes,
Paul von Hippel
Statistician
Ohio State University




Reply via email to