Hi Larry,

Thanks for your question. It is hard to give general advice here, but sometimes 
I find it helpful to ask myself, "What data would I like to have if it cost 
nothing?" If the variable is "Pregnancy status" in a sample of men, the answer 
is that I wouldn't want any data. In other situations you may want data that 
are missing. In those situations, imputation can be helpful. Does this help? 
Thanks.

-Juned

-----Original Message-----
From: Impute -- Imputations in Data Analysis 
[mailto:[email protected]] On Behalf Of Hunsicker, Lawrence
Sent: Friday, March 29, 2013 10:10 AM
To: [email protected]
Subject: How to handle data missing before or after a date because the data 
collection instrument changed

Good morning, all:

Frank Harrell has suggested that I join this group to get your input into a 
question that I have about the best way to handle missing data that are missing 
simply because there was, at the time, no intent to capture these data.

I am involved in the analysis of data collected by the Scientific Registry of 
Transplant Recipients (SRTR), which obtains data from the Organ Procurement and 
Transplantation Network (OPTN) on all US solid organ transplants.  This data 
collection system has been in place since October 1987, and as you would 
expect, there has been evolution of the data elements that are collected.  New 
items may be added, and other items may be deleted.  Because of this, the data 
obtained over the total period will be missing for these new or deleted 
variables on a large fraction of the cases.  While the data are "missing 
completely at random" from the individual case point of view, there will of 
course be correlation of the missingness with date of transplant, and the 
missingness of various elements will be strongly correlated with one another.

 I wonder if a case can be made that this situation is analogous to some extent 
to the situation where the value for a specific variable is "Not relevant," 
such as for questions about pregnancy for males.  In the past,  I have handled 
these variables by creating for the categorical variables (the majority of 
them) a category "not collected" .  This distinguishes this sort of missingness 
from the situation where the data "should" have been collected, but is missing 
for some other reason.  But Frank has convinced me that this approach will 
likely bias both the covariance matrices and the estimates of precision of the 
estimated model variable coefficients.  I can, of course, use multiple 
imputation.  This is probably the most "correct" approach.  But because of the 
size of the dataset (about 200,000 transplants), the computational expense is 
non-trivial.

This can't be a problem unique to my situation.  What are your thoughts and 
recommendations?

Thanks in advance for your thoughts on this matter.

Larry Hunsicker
Prof. Medicine
U. Iowa College of Medicine


________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the 
Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and 
may be legally privileged.  If you are not the intended recipient, you are 
hereby notified that any retention, dissemination, distribution, or copying of 
this communication is strictly prohibited.  Please reply to the sender that you 
have received the message in error, then delete it.  Thank you.
________________________________

Reply via email to