Under certain circumstances, I'm finding that my imputation model imputes outlying values. I'm not sure whether this problem is peculiar to the software I am using (IVEware), or whether similar problems would be expected from any software. Details follow.
I'm imputing test scores, demographics, and other variables for ~5000 students clustered in ~300 schools. To account for the clustering, I am including the school ID variable in the imputation model. In a few schools, all students are missing scores for a fall reading test. In those schools, IVEware imputes the same score for each student. Typically the imputed score is one of the boundary values that I have imposed. If no boundary values are imposed, then the imputed scores are impossibly high or low. Under these circumstances, the effect of the school on fall reading scores cannot be estimated directly. It appears that the program is responding to this situation by assuming the school effect is very large, and ignoring or swamping the predictive value of other observed variables, such as the spring reading test. I wonder how I can get more plausible imputations from the model. Best wishes, Paul von Hippel Paul von Hippel Statistician Department of Sociology / Initiative in Population Research Ohio State University
