I am interested in what others have to say about this -

I have monthly event data that covers many individuals over several 
years.  For discussion purposes say I have 100 people and 36 months of 
data for them, so I have 3600 observations.  Events are not consistent 
from month to month, but there is some consistency in events across 
months at an individual level.

If I randomly sample from those 3600 observations - for example take 
20% using a uniform random number generator - then I am confident is 
saying the unsampled data can be characterized as MCAR - missing 
completely at random.  The mechanism for being unsampled has nothing to 
do with variables in the data.

NOW, another sampling method is to randomly sample the MONTHS in the 
dataset.  I take a 20% sample of the months (producing 7 months), and 
take all the people represented in those months (100 each month), for a 
total of 700 observations.

Can I assert that this second sample is also MCAR?  The mechanism for 
not being sampled is based solely on a random number generator.

Is the fact that some months are not represented a problem?  Would it 
be just MAR because the nature of the events in those unsampled months 
could be different than those sampled months?  Would characterizing it 
as MAR be a problem also?

Appreciate any input.


Reply via email to