After literally months of work, I've finally managed to use multiple 
imputation. I'm so close now, so I'd be really grateful for any help. I 
want to include an interaction between a continuous and a categorical 
variable.

I have a rather large cross-sectional population study performed at two 
time points in the same population (n=40,000). Completely observed 
variables are: age (continuous),  gender (M/F), time (1/2).  Variables with 
missing values include various symptoms (e.g. asthma) and risk factors 
(0/1) (6 all in all). Unit response rate is about 80%, and unit 
nonresponders are included in the data. Item nonresponse varies from 1% to 
30%. Hardware and software is Schafer's MIX, SPLUS 2000, Win98, 1Ghz, 384 
Mb RAM. Using restricted models: initial estimates with ECM, data 
augmentation with DABIPF.

The main point of the analysis is to estimate the risk of asthma (or other 
symptom/diagnosis) by age, separately for the two time points. This is done 
using logistic regression, with outcome asthma and main effects of age 
(cubic spline), time(dummy 0/1) and the interaction with age(splined) and 
time(dummy).

I'm having some trouble producing an imputation model that includes this 
interaction. Since age is a continuous variable, an "obvious" possible 
model is:
-categorical/loglinear W: gender, time
-continuous Z: age, asthma and the other symptoms
-design matrix: gender, time, gender*time (margins: don't matter)

Am I right that this model does not properly include the time by age 
interaction?

So what I've done is use age as a categorical variable instead:
-categorical/loglinear W: gender, time, age as a categorical variable 
(dummy coded)
-continuous Z: asthma and the other symptoms and risk factors (6 all in 
all)
-margins: "main effects" only, and design matrix: time, gender, age 
(dummy), time*age(dummy)

A model that does not include the time*age interactions runs fine. When use 
the model with time*age interactions and try to find ECM estimates 
(ecm.mix) the first step produces mu and sigma components that are all NA. 
The data are all coded correctly (1/2) and I've always run rngseed() and 
prelim.mix() first.I've tried different number of age groups (3,4, 5 age 
groups and each age as a category (56)), different priors for ECM and 
different random number seeds (which shouldn't matter), tried excluding 
unit nonresponders, tried taking the logarithm of the "continuous" 
variables. I've also tried imputing only for asthma, to no avail. When I 
use only unit responders (imputing only item nonresponse) I get the 
following warning, then an application halt: Warning: .Fortran("mstepcm",: 
sqrt(-7.27596e-0.12): DOMAIN error.

Do you have any suggestions ? Have I dug myself too deep - am I 
misinterpreting the general location model ? I'm grateful for any help.

Jan Brogger, MD
University of Bergen, Norway


Reply via email to