I think that the impact on the variance of target parameters of using a class 
of variables in the imputation will be stronger for the class of adjunct 
variables than for the class of causally prior covariates in the target model. 
Parallel or alternate outcomes are particularly good examples of this. People 
who favor nesting variables with nonresponse within flags for missingness as an 
alternative to imputation fail to realize these gains in precision and possibly 
in bias reduction. (Obviously, they cannot include parallel outcomes in their 
analytic models.)

It harkens back to one of the central themes in the debate between imputation 
and ANCOVA. The imputer frequently has access to a richer set of auxiliary 
information than does the downstream analyst. If we are shy about using that 
information in the imputation, then we have surrendered most of the advantage 
of imputation over the alternatives.

To give an example from my own work, I had a longitudinal sample of 8th graders 
with parent interviews for the fall of the normative freshman year of college. 
Parent nonresponse was high with the 4.5 year gap. The primary outcome of 
interest was college admission. We matched students to administrative datasets 
about college going.  We could not report the administrative data directly 
because of known biases (e.g. no coverage of children from families who do not 
require financial aid).  Using the match status as an adjunct variable in 
imputation of the parent responses, however, had a huge impact on the final 
estimates. In addition to strong variance reduction, we also discovered that 
parent nonresponse was strongly nonignorable. Those whose children did not go 
to college were far less likely to respond to the survey. I would send you a 
reference but unfortunately the evaluation was cancelled without a report.

--Dave Judkins

Sent from my iPhone

On Apr 16, 2013, at 8:52 AM, "Hunsicker, Lawrence" 
<[email protected]<mailto:[email protected]>> wrote:

Many thanks to Drs. Judkins, VonHippel, and Raghunathan:

We seem to have a consensus that inclusion of auxillary variables is 
legitimate.  Frank Harrell also concurs in a separate e-mail, and I have just 
read a piece by Paul Allison that comes to the same conclusion.  But it is nice 
to have the Meng reference to document that it is mathematically correct.  I 
appreciate also Raghu’s suggestion that where it is likely that data will be 
missing, the data collection plans should deal with this prospectively by 
collecting data on potentially useful auxiliary covariates.

There seem to me to be two ways to think about whether “the auxiliary variable 
helps a little or a lot.”  If one’s metric is the accuracy of the imputation, 
it seems pretty likely to me that the accuracy of the imputation will be 
improved by including a strongly correlated auxiliary variable.  One could 
check this by checking the correlation of the values predicted by the 
imputation model with the actually observed values.

But the real issue is how much inclusion of the auxiliary variables in the 
imputation model improves the results of the actual final analysis.  This has 
to be a function of how strongly the covariate with missing values correlates 
with (predicts) the final outcome variable, and with the amount of missing 
data.  The analysis that I posed in my original post is not a particularly good 
one for asking this question, as the amount of missing data for the current PRA 
is only about 3%, and the impact of current PRA on graft survival is not 
particularly strong.  It was a convenient straw man to permit me to ask the 
question coherently.

But I agree that it would be worthwhile asking the question about how much 
inclusion of an auxiliary variable helps using an appropriate data set.  I’ll 
think about this and see if I can construct some data sets that permit a test 
of this question.

Again, thanks to all.

Larry Hunsicker
Prof. Internal Medicine
U. Iowa College of Medicine

From: Paul von Hippel [mailto:[email protected]]
Sent: Tuesday, April 16, 2013 5:57 AM
To: Hunsicker, Lawrence; 
[email protected]<mailto:[email protected]>
Subject: Re: "Accessory" variables in imputation

You may be right. From Larry's point of view the important thing is that 
inclusion of auxiliary variables is legitimate but not mandatory.

Larry: if you like you could fit the imputation model with and without the 
auxiliary variable, analyze the data both ways, and report back how much 
smaller your standard error is when you use the auxiliary variable. Of course, 
you're under no obligation to do that, but it would be interesting to know if 
this is a situation where the auxiliary variable helps a little or a lot.


On Mon, Apr 15, 2013 at 7:21 PM, David Judkins 
<[email protected]<mailto:[email protected]>> wrote:
I would say that it all depends. In Hunsicker's example, peak PRA sounds like 
it was excluded from the outcome space because of colinearity issues. This 
makes it an ideal adjunct variable to the imputation process.

--Dave Judkins

Sent from my iPhone

On Apr 15, 2013, at 7:13 PM, "Paul von Hippel" 
<[email protected]<mailto:[email protected]>> wrote:
Let me correct my first sentence: What I meant to say is that Meng showed that 
MI imputation is still valid of auxiliary variables have been included in the 
imputation model. So it's a legitimate practice and, if its' not too much 
trouble, why not. But it probably won't make much difference.


________________________________
From: Paul von Hippel 
<[email protected]<mailto:[email protected]>>
To: 
[email protected]<mailto:[email protected]>
Sent: Monday, April 15, 2013 4:39 PM
Subject: Re: "Accessory" variables in imputation

Meng showed that MI imputation is still valid if auxiliary variables have been 
included in the analysis. In theory auxiliary variables can improve the 
estimates, but in practice they rarely help much. See the recent paper by Sarah 
Mustillo in Sociological Methods & Research.

On Mon, Apr 15, 2013 at 4:27 PM, Hunsicker, Lawrence 
<[email protected]<mailto:[email protected]>> wrote:
Good afternoon, all:

A question about the use of "accessory" variables in imputation.  Consider for 
a moment a kidney transplant survival model in which one has data (among other 
things) on peak panel reactive antibody (peak PRA) and the PRA at the time of 
the actual transplant (current PRA).  These actually measure different things, 
but they are obviously strongly correlated.  Data are missing of some fraction 
of these covariates, but most of the time one or the other is available.  
Current PRA is considered to be the stronger predictor of transplant outcomes.  
One is developing a model in which one wants to limit the model df.  So it has 
been decided that the final model will include current PRA but not peak PRA.

I understand that the imputation model must include the outcome variable and 
also all of the covariates that will be used in the final analysis model.  The 
question is whether one can/should include additional covariates (such as peak 
PRA) in the imputation model that WON'T be in the final analysis model.  It 
would seem that inclusion of peak PRA in the imputation model might improve 
considerably the prediction of current PRA, the covariate that will be included 
in the final analysis model.

Is this legitimate?

Thanks in advance to any guidance from the listserv members.

Larry Hunsicker
Prof. Internal Medicine
U. Iowa College of Medicine


________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the 
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and 
may be legally privileged.  If you are not the intended recipient, you are 
hereby notified that any retention, dissemination, distribution, or copying of 
this communication is strictly prohibited.  Please reply to the sender that you 
have received the message in error, then delete it.  Thank you.
________________________________



--
Best wishes,
Paul von Hippel
Assistant Professor
LBJ School of Public Affairs
Sid Richardson Hall 3.251
University of Texas, Austin
2315 Red River, Box Y
Austin, TX  78712
(512) 537-8112


________________________________
This message may contain privileged and confidential information intended 
solely for the addressee. Please do not read, disseminate or copy it unless you 
are the intended recipient. If this message has been received in error, we 
kindly ask that you notify the sender immediately by return email and delete 
all copies of the message from your system.



--
Best wishes,
Paul von Hippel
Assistant Professor
LBJ School of Public Affairs
Sid Richardson Hall 3.251
University of Texas, Austin
2315 Red River, Box Y
Austin, TX  78712
(512) 537-8112


________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the 
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and 
may be legally privileged.  If you are not the intended recipient, you are 
hereby notified that any retention, dissemination, distribution, or copying of 
this communication is strictly prohibited.  Please reply to the sender that you 
have received the message in error, then delete it.  Thank you.
________________________________


________________________________
This message may contain privileged and confidential information intended 
solely for the addressee. Please do not read, disseminate or copy it unless you 
are the intended recipient. If this message has been received in error, we 
kindly ask that you notify the sender immediately by return email and delete 
all copies of the message from your system.

Reply via email to