I browsed the archives for threads related to interaction terms in the
multivariate normal imputation model (SAS PROC MI, MCMC option) and found
some interesting responses, specifically S. van Buuren's advice that
interaction terms should be added to the imputation model or the
interaction will be biased toward zero in analysis.  This was in the
context of a question on categorical interaction terms in the Schafer CAT
model.  In contrast, I have a model with two continuous predictors, say X
and Z, with 10% missing X and 20% missing Z, and the interaction XZ of
substantive interest.  My reading of Schafer(1997) is that the NORM model
cannot handle continuous interaction terms and that "this is an area for
future work" (p. 381).  Schafer and Olsen (MBR, 1998) say it is possible to
add an interaction term to a NORM model but again this is in the context of
an example with a categorical interaction.  Any references on adding
continuous interaction terms to the NORM model or other models to consider?

A related issue is that we are in the habit of centering continuous
predictors at zero in regression models with interactions, following Cohen
et al (2003, ch 7), in order to facilitate interpretation of the
coefficients of each main effect at the mean of the other main effects,
especially when zero is outside the range of the raw data, eg. Left
Ventricular Ejection Fraction = 0 means the patient is dead!  I'm having
some difficulty translating centering to the MI process.  For example, 1)
if I center the observed data before imputation, the mean of the complete
data is slightly different than zero.  On the other hand, 2) if I center
the data in each imputed dataset, I remove all between-imputation variance.
I have also considered 3) subtracting the complete data combined mean
(Q-bar) from each imputed dataset which results in a combined mean of zero
but the means in each imputed dataset slightly above or below zero.
Approach #2 seems totally wrong because it defeats the whole purpose of
multiple imputation, but I am torn between #1 and #3.  Any advice on which,
if either, of these approaches is best?  I'm leaning toward #3 because it
guarantees a centered Q-bar at zero, but I'm not sure about the
interpretation of the combined interaction coefficient, resulting from m
regressions, each conducted on a dataset with "centered" means slightly off
zero.  Any guidance?

Bill Howells, MS
Behavioral Medicine Center
Washington University Medical School
St Louis, MO



Reply via email to