Hi, I'm working with three ANCOVAs with categorical covariates. The variables of interest are continuous as are the DVs and all of these variables are completely observed. The missing data exist for the categorical predictors. There are three of them:
1) four level predictor, 17% missing data 2) four level predictor, 7% missing data 3) two level predictor, 3% missing data The investigators I'm working for have good reason to believe that these data are unavailable vs. not applicable. They are items which ask about different mutually exclusive/exhaustive aspects of abuse experienced by individuals. It's reasonable to expect that there is some (unobserved) response to these items since individuals were selected into the study based on their exposure to abuse. It's likely that some individuals refused to answer these items. Unfortunately, the original data coders are not available to ask about the proportion of refused vs. don't know responses in each of these cases. My simple minded approach was to collapse these individuals into one of the existing categories. To do this I found the outcome means for each level of the predictors and collapsed the missing value cases into the category with the most similar outcome means. I understand these missing data to be non-ignorable. But since the function of imputation for this analysis is to maximize the N for the covariates and not the primary focus of the study, I initially thought that a simple-minded, ad hoc approach would suffice. However, a reviewer's question has caused me to rethink that. The reviewer believes that we have used an illegitimate method that has overly favored our hypotheses by "imputing" data in this way. I disagree that we have biased our data in favor of our hypotheses, first, since we had no hypotheses about the covariates per se, and second only one of the 21 contrasts implied by the levels of the covariates was significant. The reviewer's general point that my ad hoc method is not standard has caused me to consider asking for advice from others with more experience. Should I be engaging in a more formal imputation procedure (e.g., multiple imputation), for these covariates? Are there problems with my approach I haven't forseen? Any suggestions welcomed. Thanks in advance, Scot McNary -- Scot W. McNary email:[email protected]
