I am thinking of implementing an S function for multiple
imputation based on the following strategy.

1. Use a type of generalized additive model based on nonparametric
    smoothers to predict a sometimes-missing covariable on the basis of
    other covariables and the response variable.  One promising model
    is Tibshirani's AVAS (additivity and variance stabilization)
    method, which seeks monotonic transformations in the variable
    being predicted so that the transformed variable has constant
    variance across levels of predictors.  This will result in
    higher R^2 as well as allowing a constant-width window (epsilon,
    below) to be used for matching, versus predicting a highly
    skewed variable on its original scale, for example.

2. Use this model to obtain predicted transformed values of the
    target sometimes-missing variable.  These transformed values
    are usually scaled to have mean zero and variance 1.

3. Use predictive mean matching: For each subject having a missing
    value of the target variable, compute her predicted mean
    transformed value from the semiparametric model and call it
    u.  Find all subjects having non-missing values such that their
    predicted value is within epsilon of u.  Sample m of those
    subjects actual values, with replacement, and use these as
    the m multiple imputations for the target variable for the
    subject in question.

Is this a reasonable multiple imputation strategy?
How does one choose epsilon?

Thanks in advance for any thoughts on this proposal.
--
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Reply via email to