I am interested in what others have to say about this - I have monthly event data that covers many individuals over several years. For discussion purposes say I have 100 people and 36 months of data for them, so I have 3600 observations. Events are not consistent from month to month, but there is some consistency in events across months at an individual level.
If I randomly sample from those 3600 observations - for example take 20% using a uniform random number generator - then I am confident is saying the unsampled data can be characterized as MCAR - missing completely at random. The mechanism for being unsampled has nothing to do with variables in the data. NOW, another sampling method is to randomly sample the MONTHS in the dataset. I take a 20% sample of the months (producing 7 months), and take all the people represented in those months (100 each month), for a total of 700 observations. Can I assert that this second sample is also MCAR? The mechanism for not being sampled is based solely on a random number generator. Is the fact that some months are not represented a problem? Would it be just MAR because the nature of the events in those unsampled months could be different than those sampled months? Would characterizing it as MAR be a problem also? Appreciate any input.
