A few months ago, I posted a note asking how to estimate R^2 (and other quantities) when values are multiply imputed. A respondent suggested that I use the same strategy as that used to estimate the regression coefficients: get a point estimate from each imputed data set, and average these.
Today I began to wonder about this. Consider the regression Y=rX+e where X and Y are standard normal variables. Then R^2 = r^2. It was suggested that R^2 could be estimated by averaging the estimates of R^2=r^2 across multiple imputations. Yet r is estimated by averaging the estimates of r across multiple imputations. In general, these estimates will not agree: if r>0, then the estimate of R^2 will be less than the squared estimate of r. If the estimator of r is unbiased, then the proposed estimate of R^2 must be biased. It strikes me there must be a lot of quantities for which we cannot obtain unbiased estimates using this procedure. Pertinent citations would be most appreciated. Best wishes, Paul von Hippel Statistician Ohio State University
