Hi,

I'm working with three ANCOVAs with categorical covariates.  The variables
of interest are continuous as are the DVs and all of these variables are
completely observed.  The missing data exist for the categorical
predictors.  There are three of them:

1) four level predictor, 17% missing data
2) four level predictor, 7% missing data
3) two level predictor, 3% missing data

The investigators I'm working for have good reason to believe that these
data are unavailable vs. not applicable.  They are items which ask about
different mutually exclusive/exhaustive aspects of abuse experienced by
individuals.  It's reasonable to expect that there is some (unobserved)
response to these items since individuals were selected into the study
based on their exposure to abuse.  It's likely that some individuals
refused to answer these items.  Unfortunately, the original data coders
are not available to ask about the proportion of refused vs. don't know
responses in each of these cases.

My simple minded approach was to collapse these individuals into one of
the existing categories.  To do this I found the outcome means for each
level of the predictors and collapsed the missing value cases into the
category with the most similar outcome means. 

I understand these missing data to be non-ignorable.  But since the
function of imputation for this analysis is to maximize the N for the
covariates and not the primary focus of the study, I initially thought
that a simple-minded, ad hoc approach would suffice.  However, a
reviewer's question has caused me to rethink that.

The reviewer believes that we have used an illegitimate method that has
overly favored our hypotheses by "imputing" data in this way.  I disagree
that we have biased our data in favor of our hypotheses, first, since we
had no hypotheses about the covariates per se, and second only one of the
21 contrasts implied by the levels of the covariates was significant.

The reviewer's general point that my ad hoc method is not standard has
caused me to consider asking for advice from others with more experience.  
Should I be engaging in a more formal imputation procedure (e.g., multiple
imputation), for these covariates?  Are there problems with my approach I
haven't forseen?  Any suggestions welcomed.

Thanks in advance,

Scot McNary


--
  Scot W. McNary  email:[email protected]   

Reply via email to