This is a very complex problem. Years ago, I worked on a similar problem of imputing cost and payment share data about medical events. In the United States, the cost for medical care is often shared by several parties. In the problem I worked on, there were cases for which respondents were able to report how much they personally paid but for which they could not report either the amounts paid by other parties, the total cost, or either. There were also cases in which they knew the total cost but had no payment share data, perhaps because negotiations were continuing. Working with others here, I developed algorithms that would impute partial compositional data. Here are some references.
Marker, D. A., Judkins, D. R., and Winglee, M. (2001). Large-scale imputation for complex surveys, in Survey Nonresponse, Eds. R. M. Groves, D. A. Dillman, E. L. Eltinge, and R. J. A. Little. New York: Wiley. England, A., Hubbell, K., Judkins, D., and Ryaboy, S. (1994). Imputation of medical cost and payment data. Proceedings of the Section on Survey Research Methods of the American Statistical Association, 406-411. Judkins, D. R., Hubbell, K.A., and England, A.M. (1993). The imputation of compositional data. Proceedings of the Section on Survey Research Methods of the American Statistical Association, 458-462. -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Drechsler J?rg Sent: Tuesday, February 05, 2008 11:37 AM To: [email protected] Subject: [Impute] Imputation under logical constraints Hi all, I have some questions for imputation under logical constraints: I am multiply imputing missing values for variables from an establishment survey using sequential regression. Now let's start with an easy case: I have to make sure that the condition y1<=y2 always holds for my imputed values. In this case, I compute y1 as the fraction of y2 for the observed part of the data and impute these fractions instead of the real values and whenever my imputed values are outside the bounds [0%;100%], I simply redraw the value for this observation until the condition is fulfilled. Any other ideas how to do that? It gets more difficult if I have to make sure that the condition y.total=y1+y2+y3 is fulfilled. If I just impute y1,y2, and y3 and then simply define y.total=y1+y2+y3 I expect that I will overestimate the total number. Another idea would be to impute all the variables independently and then downweight y1, y2 and y3 to make sure that the above condition is fulfilled. But I find neither of the two ideas to be satisfying. Are there other ways to do it? Things start to get real funny, if the above conditions also have to be fulfilled for subpopulations. Say y.total is the total number of employees and y1,y2,and y3 are number of employees for different levels of qualification. What if the question is: How many of these employees are females? Then I have to make sure that y.total=y1+y2+y3 y.total.f=y1.f+y2.f+y3.f y.total.f<=y.total y1.f<=y1 y2.f<=y2 y3.f<=y3 I am in real trouble here and any ideas or comments are highly appreciated. Joerg Institute for Employment Research Nuremberg, Germany _______________________________________________ Impute mailing list [email protected] http://lists.utsouthwestern.edu/mailman/listinfo/impute
