There are a couple of ways of looking at this, and it pays to think a bit 
beyond the definitions to the analyses that would be required.  I agree 
that your first scenario is MCAR.  In the second scenario, missingness is 
clustered by months.  An analysis that ignored the clustering (by ignoring 
the dependence of missingness on the month labels) would be likely to be 
wrong, esepcially if you were interested in inference beyond the sampled 
months.  However we sometimes make a distinction between design variables 
(those known before data are collected) and data; with that distinction, 
the month labels are design variables but the missingness is independent 
of both observed and unobserved data values.

In any case the bottom line is that an analysis that didn't take into 
account the fact that data are only collected in some months would most 
likely be incorrect in some way.

> Date: Fri, 25 Feb 2005 15:39:12 -0700
> From: Melissa Roberts <[email protected]>
> Subject: [Impute] MCAR & MAR assessment
> 
> I have monthly event data that covers many individuals over several 
> years.  For discussion purposes say I have 100 people and 36 months of 
> data for them, so I have 3600 observations.  Events are not consistent 
> from month to month, but there is some consistency in events across 
> months at an individual level.
> 
> If I randomly sample from those 3600 observations - for example take 
> 20% using a uniform random number generator - then I am confident is 
> saying the unsampled data can be characterized as MCAR - missing 
> completely at random.  The mechanism for being unsampled has nothing to 
> do with variables in the data.
> 
> NOW, another sampling method is to randomly sample the MONTHS in the 
> dataset.  I take a 20% sample of the months (producing 7 months), and 
> take all the people represented in those months (100 each month), for a 
> total of 700 observations.
> 
> Can I assert that this second sample is also MCAR?  The mechanism for 
> not being sampled is based solely on a random number generator.
> 
> Is the fact that some months are not represented a problem?  Would it 
> be just MAR because the nature of the events in those unsampled months 
> could be different than those sampled months?  Would characterizing it 
> as MAR be a problem also?



Reply via email to