It's even more complicated than that.  Suppose your variable X is sometimes 
missing (let's say MCAR to keep things simple), but you have another variable Z 
that is fairly well correlated with X and is fully observed.  If you are 
interested in  the regression of a third variable Y on X (or the correlation of 
Y and X), your imputation of X based on Z (and Y) might be quite beneficial.  
However, if what you are interested in the coefficient of X in the regression 
of Y on X,Z, (or the partial correlation of X,Y given Z), your imputations 
won't do you much good.  Your imputations of X are only telling you about the 
Y,Z relationship, which helps (since it mediates the X,Y association) but gives 
you no direct information about the partial association X,Y|Z.

Three points here:  (1) to fully understand what imputation can contribute, you 
might need to understand how the observed-data inference works (as suggested 
above, although my explanation was only heuristic).  (2) The contribution of an 
auxiliary variable to the value of imputation can be highly analysis-specific.  
(3) The value of using in imputation an auxiliary variable external to the 
analysis might be greater than that of variables in the analysis.

A concrete example to fix the abstractions of the first paragraph:  suppose you 
are interested in associations of family income (X) with educational outcomes 
(Y), but X is often missing.  Then block-group median income (Z) might be an 
excellent auxiliary variable.  However if your objective is to assess the 
distinct contributions of family and neighborhood income to prediction of Y, 
imputing family income from neighborhood income won't do much good.  You might 
do better imputing based on something completely different like rent paid or 
value of family automobile.

________________________________
From: Impute -- Imputations in Data Analysis 
[[email protected]] On Behalf Of Hunsicker, Lawrence 
[[email protected]]
Sent: Tuesday, April 16, 2013 8:51 AM
To: [email protected]
Subject: FW: "Accessory" variables in imputation

.
.
.

There seem to me to be two ways to think about whether “the auxiliary variable 
helps a little or a lot.”  If one’s metric is the accuracy of the imputation, 
it seems pretty likely to me that the accuracy of the imputation will be 
improved by including a strongly correlated auxiliary variable.  One could 
check this by checking the correlation of the values predicted by the 
imputation model with the actually observed values.

But the real issue is how much inclusion of the auxiliary variables in the 
imputation model improves the results of the actual final analysis.  This has 
to be a function of how strongly the covariate with missing values correlates 
with (predicts) the final outcome variable, and with the amount of missing 
data.  The analysis that I posed in my original post is not a particularly good 
one for asking this question, as the amount of missing data for the current PRA 
is only about 3%, and the impact of current PRA on graft survival is not 
particularly strong.  It was a convenient straw man to permit me to ask the 
question coherently.

Reply via email to