I've finally started using multiple imputation with S-PLUS (S-PLUS 2000, 
Win98) and Schafer's mix. But I'm having trouble with convergence. I'm 
desperate to show that multiple imputation works - nobody's even heard of 
it in Norway. I've banged my head against the wall for the last week, so I 
really hope you can help.

I've got a simple population survey. The response rate was 90%. Gender, age 
and response are completely observed. I'm having these problems even when I 
include only one missing variable (e.g., astma) which has only 10% missing 
data.

After a certain number of data augmentation steps some of my estimates 
(component of theta) either
1) skyrocket into a totally unreasonable range or
2) become missing.

This might take 100 steps or 10,000 steps. Not all my chains do this, but I 
think that sooner or later they will.
All the EM estimates (from the default starting point) converge in <600 
steps, and most in <10 steps.   

I've tried
a) a minimal model (only one variable with missing data, 10% missing), and 
a large model, and a few in between.
b) different starting points: the default of EM, EM after DA, and some 
totally random starting points.
c) different random number seeds.
d) multiple chains
all to no avail

I've even written some tools (library mixs=mix support) to help do this. I 
can now do
    chains <- run.chains.mixs(prelim, starttheta, max.steps=1000, 
max.chains=5)
    plot.chains.mixs(chains)
to run 5 chains of 1000 steps each, then plot the log-likelihoods and the 
components of theta. Please tell me if this sounds interesting.

I've also tried the St.Louis data (the unrestricted model). They exhibit 
the same pattern.

Will any of you *please* take a look at my explanation and plots (including 
plots of the St.Louis data) at:
http://www.uib.no/mpkjc/imput/

My thoughts and possible explanations are:
-this type of behaviour will happen for any DA chain if you run it long 
enough ? I.e. should I stop at 500 steps and be happy ?
-there is no defined posterior distribution to which the DA can converge ?
-categorize age instead of using at as a continuous variable ?

I'd be really grateful for any help or advice you could provide.

Yours,
Jan Brogger
PhD student, Respiratory Epidemiology Group, University of Bergen

Reply via email to