Greetings,

After spending the last few months combing the literature in hopes of finding 
different examples of mechanisms for missing data, I have collected a list of 
definitions that appear to me to be incompatible.  My tendency is to lean 
toward Rubin, his former students, and colleagues for their definitions but I 
may simply be biased by the halo effect.  Nevertheless, I would 
like to resolve some of these descrepencies with your assistance.  The purpose 
of this exercise is to resolve some of the definitional problems with these 
mechanisms so I may describe them a bit more clearly to an audience with 
little math or statistical background.  Hopefully you folks can help me out a 
bit.

To protect the guilty and to decrease any biases we might share, I will 
withold the references for now.  Below are definitions abstracted from several 
different sources.  I will try my hand at interpreting the defintion and would 
appreciate it if others would correct me if I am wrong.

____________________________________________________________________________
Article 1:

MCAR:  cases with complete data are indistinguishable from cases with 
incomplete data

MAR:  cases with incomplete data differ from cases with complete data, but the 
pattern of data missing is traceable or predictable from other variables in 
the database rather than being due to the specific variable on which the data 
are missing.  That is, the actual variables where the data are missing are not 
the cause of incomplete data.  Instead, the cause of the missing data is due 
to some other external influence.

NI:  The pattern of data missingness is non-random and it is not predictable 
from other variables in the database.
___________________________________________________________________________

My comments:  No problems with MCAR or NI from Article 1 but MAR appears 
problematic.  It appears that MAR could exist as long as the missing data 
mechanism is a function of independent variables and covariates.  The 
definition above includes all variables in the database.  This appears to go 
beyond what Rubin said in his original definition.  Am I correct in my 
statement?

___________________________________________________________________________
Article 2:

MCAR: missing values are a simple random sample of all data values

MAR: the problem that an observation is missing may depend on Yobs (i.e., 
observed part of Y variable) but not on Ymis (i.e., missing part of Y).
___________________________________________________________________________

My comments:  These are classic defintions that appear to be consistent with 
Rubin's original definitions.  I usually have a problem explaining the 
dependencies in MAR to the uninitiated - a group that I may very well fall 
into!  The frequent comment I get is "how can the missing dummy code be 
related to only Yobs and not Ymis?".

___________________________________________________________________________
Article 3:

MCAR:  whether an observation is missing or not is unrelated to any variable 
in the study

MAR: is less stringent in that missingness can be correlated with some 
variables in the dataset, but not the outcomes of interest.

NI:  means that those observations that are missing have extreme values on an 
outcome of interest.

If a dichotomous variable for missingness is created, then:

MCAR: there is no correlation between some peripheral variable and the 
missingness variable.

MAR:  there is a correlation between some peripheral variable and the 
missingness variable.

NI:  there is a correlation between the outcome variable of interest and the 
missingness variable; the missing cases are the ones that have either 
extremely high or low values.
___________________________________________________________________________

My comments:  MCAR appears correct but I see some problems with MAR and NI.  
Similar to Article 1, the above definition seems to be too inclusive.  Again, 
if I am wrong about the exclusion of other relevant predictor variables then 
this definition would suffice.  The NI definition appears too simplistic and 
both the MAR and NI definitions appear to give you the ability to discriminate 
among the two mechanisms.  I don't think that this is actually the case.

If I have kept your attention this far, perhaps you could comment on my 
responses and the definitions above.  Thanks in advance for your thoughts.

Cheers,

Patrick
________________
Patrick E. McKnight, Ph.D.
Evaluation Group for Analysis of Data (EGAD)
University of Arizona
Dept. of Psychology
Tucson, AZ  85721
520-621-5463
[EMAIL PROTECTED]



Reply via email to