I browsed the archives for threads related to interaction terms in the multivariate normal imputation model (SAS PROC MI, MCMC option) and found some interesting responses, specifically S. van Buuren's advice that interaction terms should be added to the imputation model or the interaction will be biased toward zero in analysis. This was in the context of a question on categorical interaction terms in the Schafer CAT model. In contrast, I have a model with two continuous predictors, say X and Z, with 10% missing X and 20% missing Z, and the interaction XZ of substantive interest. My reading of Schafer(1997) is that the NORM model cannot handle continuous interaction terms and that "this is an area for future work" (p. 381). Schafer and Olsen (MBR, 1998) say it is possible to add an interaction term to a NORM model but again this is in the context of an example with a categorical interaction. Any references on adding continuous interaction terms to the NORM model or other models to consider?
A related issue is that we are in the habit of centering continuous predictors at zero in regression models with interactions, following Cohen et al (2003, ch 7), in order to facilitate interpretation of the coefficients of each main effect at the mean of the other main effects, especially when zero is outside the range of the raw data, eg. Left Ventricular Ejection Fraction = 0 means the patient is dead! I'm having some difficulty translating centering to the MI process. For example, 1) if I center the observed data before imputation, the mean of the complete data is slightly different than zero. On the other hand, 2) if I center the data in each imputed dataset, I remove all between-imputation variance. I have also considered 3) subtracting the complete data combined mean (Q-bar) from each imputed dataset which results in a combined mean of zero but the means in each imputed dataset slightly above or below zero. Approach #2 seems totally wrong because it defeats the whole purpose of multiple imputation, but I am torn between #1 and #3. Any advice on which, if either, of these approaches is best? I'm leaning toward #3 because it guarantees a centered Q-bar at zero, but I'm not sure about the interpretation of the combined interaction coefficient, resulting from m regressions, each conducted on a dataset with "centered" means slightly off zero. Any guidance? Bill Howells, MS Behavioral Medicine Center Washington University Medical School St Louis, MO