Still working on our multiple imputation analysis and appreciate any comments and guidance . . . .  We imputed data under the multivariate normal model with SAS PROC MI MCMC for a covariate (X2, a medical test) that is 20% missing.  We reasoned that because X2 was missing for administrative reasons, e.g., because the patient was discharged from the hospital before the research staff was able to obtain X2, the argument could be made the data were MAR.   The analysis of interest is a Cox regression model of time to death.  We put a lot of thought into building the imputation model and were careful to include other covariates that were highly correlated with X2 and all those that we want in the analysis model (note: did not include time to death because of censoring and not MVN).  We used 50 imputations for a dataset of n=682, which is probably overkill, but we have the computing power and disk storage. 

 

The result: the regression coefficient for the covariate of interest (X1) was actually lower (ie. closer to zero) in the MI analysis than in the complete case analysis.  This result was completely unexpected.  What we expected was simply a more efficient estimate of X1, which had p=0.10 in the complete case analysis.  

 

To explore these results we did the following:

 

1.  Ran the Cox regression on the subset of patients missing X2 and found that the relationship of X1 to outcome was in the opposite direction to the complete cases, ie. X1 was protective in the patients missing X2, while previous research and theory hold that X1 is a risk factor. 

 

2.  Compared the characteristics of patients missing and not missing X2 and found a mixed bag as far as prognosis, although the patients in the missing group had some important characteristics that conveyed better prognosis (eg. younger) (I understand this as a test of MCAR assumption). 

 

3.  Examined the imputed values of X2 and found the mean was slightly lower than the observed X2 values.  Lower X2 is usually associated with worse outcome. 

 

4.  Did a sensitivity analysis by adding and subtracting constants from the imputed values and found that the resulting MI analyses were more in line with our hypothesis (ie. statistically significant harmful effect of X1) when the imputed values of X2 are inflated. 

 

Lingering questions

 

1.  Where do we go from here?!  Certainly, we feel that the MI analysis is “better” in some sense than the complete case analysis (eg. uses all the data), even though the results do not provide statistical support for our hypothesis (ie. not “statistically significant” at p<.05). 

 

2.  We are confused by the fact that the patients missing X2 have characteristics that are associated with better prognosis yet the imputed values of X2 are lower than the observed data, which implies a worse prognosis.  Is this something to worry about? 

 

3.  Is there any justification for reversing our initial thoughts about the MAR assumption and now argue for MNAR, eg., because the patients in the missing X2 group have important differences from the fully observed patients, there might be unmeasured covariates that independently depend on X2mis? 

 

Reply via email to