Mike and other list members - I agree with Tor's points from a couple of days ago. I'll see if I can add anything.
>Imputers, >I am beginning to analyze data for my dissertation. The first thing I >have to do is deal with missing data. However, before I pose my >questions, let me give a brief description of my data. I am analyzing a >problem behavior scale for children and adolescents that consists of 120 >items rated on a 3-point scale. Of these 120 items, 101 items make up 8 >subscales and 10 of these double-load. The remaining 19 are not >assigned to any of the subscales. The number of items per subscale >range from 8 to 25. My sample size is 754. Of these, 24% are missing >one or more items with a maximum of 8 items missing for any one person >(2 respondents with missing data on 8 items). However, no more than >2.25% (17/754) of responses is missing on any given item. Within >subscale, the percent of items with missing data across persons ranges >from 10% (2/20 items) to 38% (3/8 items). Something to note: missing data isn't really a problem here, item by item, but becomes one if you were to use listwise deletion. Therefore, I think that decisions you make in the imputation model/procedure won't impact the data that much. The real benefit of imputation here is going to be in enabling the use of "real" data. In other words, I think Mike has the best kind of missing data problem, where a good chunk of the data he's going to "gain" by using multiple imputation will be actual responses which would have been discarded otherwise. In addition, the imputed responses should be fairly reliable, since he'll typically have lots of correlated, nonmissing responses per participant to use in the imputation model. [...] >Although the MI Procedure in >SAS is currently experimental, my data are already in SAS data sets so I >plan to use this software to impute the missing data. For what it's worth, almost all my data is in SPSS data sets, and I've found it worthwhile to go ahead and write them into ascii form to use in Schafer's NORM program. Your mileage may vary. [...] >Now for my questions. > >1) Given the limited range of response options (0 2) and the skewed >nature of the data, is the use of MCMC estimation under the assumption >of normality not appropriate? I'll echo Tor's sentiments here. I'll add that I used multiple imputation under similar conditions (though with logistic regression) and thought I got sensible results (references listed below). So if it helps any that someone else has made a similar decision, at least you're not the Lone Imputation Ranger. In addition, NORM offers some easy ways to do transformations, so that's an option, as is dummy coding everything (which might open it's own can of worms). In the end, though, I suspect it doesn't matter that much what you do in handling the departures of normality. As I said above, I think you're ultimately going to be imputing a very small fraction of the data, so I think any reasonable decision will produce a good set of results. >2) Assuming that I can proceed with the imputation using a normal >model, should I impute within subscale, or should I impute using the >full instrument? My suggestion would be to impute using the full instrument, since you can and need to use information provided by variables outside a particular subscale. Many of the other variables will likely hold some sort of correlation with a given variable, so I wouldn't throw that information away by just imputing within subscale. >3) If I should impute within subscale, how do I deal with the items >that are assigned to more than one subscale? > >4) After imputing the missing data and I have several complete-data >data sets, how should I combine the parameter estimates in the IRT >analysis? I will have three parameters per item to estimate. How do I >determine the between-imputation variance, the within-imputation >variance, and the total variance? How do I determine the relative >efficiency? NORM offers a very easy imputation-combination utility....not knowing much about IRT, there may be problems with this of which I'm unaware. Hope that helps! Jeff Wayman, Ph.D. Tri-Ethnic Center for Prevention Research Colorado State University Ft. Collins, CO 80521 phone: (970) 491-6969 email: [email protected] REFERENCES: Wayman, J. C. (2001). Factors influencing GED and diploma attainment in high school dropouts. Education Policy Analysis Archives, 9(4) [On-line]. Available: http://olam.ed.asu.edu/epaa/v9n4. Wayman, J. C. (2000). Educational resilience in dropouts who return to gain high school degrees. Unpublished doctoral dissertation, Colorado State University, Ft. Collins, CO.
