Hi I am trying to impute missing values for my data.frame. As I intend to use the complete data for prediction I am currently measuring the success of an imputation method by its resulting classification error in my training data.
I have tried several approaches to replace missing values: - mean/median substitution - substitution by a value selected from the observed values of a variable - MLE in the mix package - all available methods for numerical data in the MICE package (ie. pmm, sample, mean and norm) I found that the least classification error results using mice with the "mean" option for numerical data. However, I am not sure how the "mean" multiple imputatation differs from the simple mean substitution. I tried to read some of the documentation supporting the R package, but couldn't find much theory about the "mean" imputation method. Are there any good papers to explain the background behind each imputation option in MICE? I would really appreciate any comments on the above, as my understanding of statistics is very limited. Many thanks Eleni Rapsomaniki Birkbeck College, UK ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.