I want to apologize for the more or less cross post. My question is not about imputation, but I hope you do not feel too much offended, as it is a question about missing data. As I do not know any mailing list generally about missing data, I think the members of this list have the nearest knowledge on this area.
I determined joint probabilities for X1 and X2. Both dichotomous variables have values 1 or 2. Then I suppose that X1 is always observed, but X2 may be missing. The probability that X2 is missing (M2=1) is bigger for X2=1 than X2=2, in the same way for X1=1 and X1=2. There are 3 independent parameters for the probabilities of the joint distribution of (X1,X2) and 2 independent parameters for the probabilities of the conditional distribution of M2 given (X1,X2) - actually, it is the conditional probability of M2 given X2, because it is the same for both levels of X1. As I observe 4 frequencies for the joint categorization in (X1,X2) and more 2 frequencies for the categorization just in X1 (when X2 is missing), if I suppose a multinomial distribution for this data with the sum of this 6 frequencies fixed, I have 5 degrees of freedom. I think it is the simpler nonignorable response mechanism MNAR for a 2x2 partial categorized data with parameters that can be estimable by maximum likelihood. I am trying to gain more insight about this structures, so I simulated data from a multinomial joint distribution of (X1,X2,M2) and then tried to fit the same structure by ML. The marginal probabilities of X1 and X2, that I determined, are not homogeneous. One of my interests is to compare the result of a test of marginal homogeneity when I use the correct MNAR structured versus a MAR structure or when discarding the missing data (complete cases, CC). I know that this two last approaches lead to biases (for the probabilities that I determined, this two last approaches should have almost a marginal homogeneity), but as the variances of the estimated parameters for the joint probabilities of (X1,X2) get so much inflated, when comparing the MNAR strutucture to MAR and CC, usually, it is difficult to have more conclusions of marginal heterogeneity using a MNAR than MAR/CC approaches if the sample sizes are not bigger than n=500. I think it is a little bit big, because there are not problems of sample zero even with sample size of n=100. I did 10,000 simulations of the fixed multinomial parameters for each sample size and then estimated the power of the likelihood ratio test of marginal homogeneity. I was trying to gain insight for having an idea of the sample size that the adoption of a MNAR structure may actually make the difference. But the problem (sorry, but I had to introduce all this information first) is with this estimation of the power of the test. This *saturated* MNAR (0 degrees of freedom) has a likelihood ratio goodness of fit test bigger than 0.2 for 27% (14%) of the simulated data with sample sizes of n=50/100 (n=5,000), what is unacceptable. I noticed that almost all of the time that it happens, one of the estimated parameters of the conditional distribution of M2 given X2 is on the bound of the parameter space, but there are some exceptions. The Newton Raphson, Fisher scoring and Louis Turbo EM algorithms are not being used because it lead to estimated parameters outside the parameter space for a lot of simulated data. I am using a function nlmP of the library geoR of the software R, that is a Newton minimizer similar to the function nlm that comes with R base but that permit constraints for the parameters. This function usually lead to estimated parameters near to the single EM algorithm, but converges much more faster than EM. But I choosed to use the estimated parameters of the nlmP as a initial estimate to EM and then use a convergence criterion of the minimum diference of sucessive estimates of 1e-8. I repeated this exercise with a diversity of values for the 2 conditional probabilities of M2 given X2: (i) varying from the middle to the bound of the parameter space and (ii) with bigger and smaller differences between the probabilities for M2=1 given X2=1 and for M2=1 given X2=2. The amount of "anomalies" observed are almost the same. Fay (1986) and Molenberghs et al. (1999) notice the same problems of estimated parameters near the bound of the parameter space and MNAR saturated structures with LR goodness of fit greater than zero, but there are not elucidating discussion about the causes of it. Fay, R.E. (1986). Causal models for patterns of nonresponse. Journal of the American Statistical Association 81, 354-365. Molenberghs, G., Goetghebeur, E.J.T., Lipsitz, S.R. and Kenward, M.G. (1999). Nonrandom missingness in categorical data: strengths and limitations. The American Statistician 53, 110-118. I believe (or used to) that structures with this problems should not be considered further on the analysis, but as I am simulating from this structure I can't understand what is the problem. I would appreciate any help, insights, other references or discussion about problems with the estimation of MNAR by maximum likelihood. Maybe, if I get interesting results, I will use it as a part of my master degree dissertation. Thanks in advance for the patience with this big reading and pardon me for the grammar mistakes (it has been a long time that I do not practice writing in English), but I hope it has not prejudiced the understanding of the problem. -- Frederico Zanqueta Poleto [email protected] -- "An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem." J. W. Tukey
