Hi Joerg and others, In 2007 Caren Tempelman completed a doctorate thesis on Imputation of Restricted Data. In the thesis she describes various ways to impute numerical data under logical restrictions. In particular, she examines the use of a truncated normal distribution (Chapter 5) and a sequential regression approach (Chapter 6). The thesis can be found on the web site of Statistics Netherlands, see http://www.cbs.nl/en-GB/menu/methoden/research/rapporten/default.htm.
At Statistics Netherlands we have also experimented a bit with some other approaches. For the UN/ECE work session on statistical data editing (Vienna, April 2008) Natalie Shlomo (University of Southampton), Jeroen Pannekoek and I are preparing a short paper (or better: trying to prepare a short paper) on imputing data under logical restrictions where at the same time the sum (over all records) of the imputed values have to be equal to a known total. Needless to say, imputation under logical restrictions is a very hard problem, where in the end the quality of the imputed data depends on the quality of your imputation model. Best wishes, Ton de Waal -----Oorspronkelijk bericht----- Van: [email protected] [mailto:[email protected]] Namens Drechsler J?rg Verzonden: dinsdag 5 februari 2008 17:37 Aan: [email protected] Onderwerp: [Impute] Imputation under logical constraints Hi all, I have some questions for imputation under logical constraints: I am multiply imputing missing values for variables from an establishment survey using sequential regression. Now let's start with an easy case: I have to make sure that the condition y1<=y2 always holds for my imputed values. In this case, I compute y1 as the fraction of y2 for the observed part of the data and impute these fractions instead of the real values and whenever my imputed values are outside the bounds [0%;100%], I simply redraw the value for this observation until the condition is fulfilled. Any other ideas how to do that? It gets more difficult if I have to make sure that the condition y.total=y1+y2+y3 is fulfilled. If I just impute y1,y2, and y3 and then simply define y.total=y1+y2+y3 I expect that I will overestimate the total number. Another idea would be to impute all the variables independently and then downweight y1, y2 and y3 to make sure that the above condition is fulfilled. But I find neither of the two ideas to be satisfying. Are there other ways to do it? Things start to get real funny, if the above conditions also have to be fulfilled for subpopulations. Say y.total is the total number of employees and y1,y2,and y3 are number of employees for different levels of qualification. What if the question is: How many of these employees are females? Then I have to make sure that y.total=y1+y2+y3 y.total.f=y1.f+y2.f+y3.f y.total.f<=y.total y1.f<=y1 y2.f<=y2 y3.f<=y3 I am in real trouble here and any ideas or comments are highly appreciated. Joerg Institute for Employment Research Nuremberg, Germany _______________________________________________ Impute mailing list [email protected] http://lists.utsouthwestern.edu/mailman/listinfo/impute --------------- Aan de inhoud van dit e-mailbericht kunnen geen rechten worden ontleend. De informatie verzonden in dit e-mailbericht is uitsluitend bestemd voor de geadresseerde. Het Centraal Bureau voor de Statistiek staat niet in voor de juiste en volledige overbrenging van de inhoud van een verzonden e-mailbericht noch voor tijdige ontvangst daarvan. No rights may be derived from the contents of this e-mail message. The information in this e-mail message is intended only for the addressee. Statistics Netherlands cannot vouch for the correctness and completeness of the contents of e-mail messages, nor for the timely receipt thereof.
