Greetings, After spending the last few months combing the literature in hopes of finding different examples of mechanisms for missing data, I have collected a list of definitions that appear to me to be incompatible. My tendency is to lean toward Rubin, his former students, and colleagues for their definitions but I may simply be biased by the halo effect. Nevertheless, I would like to resolve some of these descrepencies with your assistance. The purpose of this exercise is to resolve some of the definitional problems with these mechanisms so I may describe them a bit more clearly to an audience with little math or statistical background. Hopefully you folks can help me out a bit. To protect the guilty and to decrease any biases we might share, I will withold the references for now. Below are definitions abstracted from several different sources. I will try my hand at interpreting the defintion and would appreciate it if others would correct me if I am wrong. ____________________________________________________________________________ Article 1: MCAR: cases with complete data are indistinguishable from cases with incomplete data MAR: cases with incomplete data differ from cases with complete data, but the pattern of data missing is traceable or predictable from other variables in the database rather than being due to the specific variable on which the data are missing. That is, the actual variables where the data are missing are not the cause of incomplete data. Instead, the cause of the missing data is due to some other external influence. NI: The pattern of data missingness is non-random and it is not predictable from other variables in the database. ___________________________________________________________________________ My comments: No problems with MCAR or NI from Article 1 but MAR appears problematic. It appears that MAR could exist as long as the missing data mechanism is a function of independent variables and covariates. The definition above includes all variables in the database. This appears to go beyond what Rubin said in his original definition. Am I correct in my statement? ___________________________________________________________________________ Article 2: MCAR: missing values are a simple random sample of all data values MAR: the problem that an observation is missing may depend on Yobs (i.e., observed part of Y variable) but not on Ymis (i.e., missing part of Y). ___________________________________________________________________________ My comments: These are classic defintions that appear to be consistent with Rubin's original definitions. I usually have a problem explaining the dependencies in MAR to the uninitiated - a group that I may very well fall into! The frequent comment I get is "how can the missing dummy code be related to only Yobs and not Ymis?". ___________________________________________________________________________ Article 3: MCAR: whether an observation is missing or not is unrelated to any variable in the study MAR: is less stringent in that missingness can be correlated with some variables in the dataset, but not the outcomes of interest. NI: means that those observations that are missing have extreme values on an outcome of interest. If a dichotomous variable for missingness is created, then: MCAR: there is no correlation between some peripheral variable and the missingness variable. MAR: there is a correlation between some peripheral variable and the missingness variable. NI: there is a correlation between the outcome variable of interest and the missingness variable; the missing cases are the ones that have either extremely high or low values. ___________________________________________________________________________ My comments: MCAR appears correct but I see some problems with MAR and NI. Similar to Article 1, the above definition seems to be too inclusive. Again, if I am wrong about the exclusion of other relevant predictor variables then this definition would suffice. The NI definition appears too simplistic and both the MAR and NI definitions appear to give you the ability to discriminate among the two mechanisms. I don't think that this is actually the case. If I have kept your attention this far, perhaps you could comment on my responses and the definitions above. Thanks in advance for your thoughts. Cheers, Patrick ________________ Patrick E. McKnight, Ph.D. Evaluation Group for Analysis of Data (EGAD) University of Arizona Dept. of Psychology Tucson, AZ 85721 520-621-5463 [EMAIL PROTECTED]