> From: Donald Baken [[email protected]] > I have both MCAR and non-random missing data. The non-random missing data > comes from questions about religion where some people have just refused > to answer any of the questions.
Of course you never know for sure which data are MAR and which are not, unless you know exactly what the process is that caused missingness (not something beyond your control like refusal). Also, I would suggest that you might mean "nonignorable" or "non-MAR" rather than "nonrandom". A very simple approach that might be sensible in this context would be to define those who refuse to answer religion questions as a separate religious group. In some cases this approach does NOT work. For example, if you are regressing propensity to diabetes on body mass index and other variables, then (1) conceptually, everybody has a BMI whether or not they give it to you, (2) the fact that they didn't give you their BMI arguably doesn't tell you that much about how BMI affects their propensity to diabetes, and (3) if you simply add a dummy variable for nonresponse to the model, then for some people the model includes BMI and for others it does not, which makes the results hard to interpret. However, in the case of religion, refusal to answer questions about religion is in fact a statement of attitudes about religion, just as is naming a religion. Now, it may be that the refusers are more heterogeneous than, for example, the Presbyterians, but arguably the best thing to do here still is to treat them as a distinct group and see how their attitudes differ from those of other religious groups. (How well this works analytically depends on what the items look like and what sort of analysis you are doing, of course.) Alan Zaslavsky Harvard Medical School
