Paul, If I'm interpreting your question correctly, you have a problem of a non-random pattern of missing data that doesn't fulfill the assumptions of either data missing completely at random (MCAR) or missing at random (MAR). I studied this same problem using the Markov chain Monte Carlo method of multiple imputation in SAS Proc MI with a larger dataset (n=10,319) and 20 variables in the imputation process. Whether I created a missing variable MCAR or by a predefined non-random selection (i.e. missing data dependent on another variable) the imputation process was robust and changed almost not at all. I compared point estimates and 95%CI from a regression analysis after imputation (proc MIANALYZE) for the manipulated variable after different percentages (10-90%) of data MCAR and data missing in a non-random pattern. For the same % missing data (MCAR or non-random pattern) the results were almost identical. These findings may be dependent on a larger sample size and having enough variables predictive of those you want to impute. Hope this helps.
Craig Craig D. Newgard, MD, MPH Research Fellow Department of Emergency Medicine Harbor-UCLA Medical Center 1000 West Carson Street, Box 21 Torrance, CA 90509 (310)222-3666 (Office) (310)782-1763 (Fax) [EMAIL PROTECTED] -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On Behalf Of Paul von Hippel Sent: Thursday, May 16, 2002 11:08 AM To: [EMAIL PROTECTED] Subject: IMPUTE: Re: Y affects probability of observing X Actually, a more complete description of the situation would be this: X is one of the variables that affects Y, Y = a0 + Xa1 + ... + error but Y is one of the variables that affects an indicator Z, Pr (Z=1) = F ( b0 + Yb1 + ...) and if Z=1 then I definitely can't observe X. I suspect the correct procedure in this situation is the same as in the simplified situation I described originally. But I thought I should be thorough. Thanks again, Paul von Hippel >On Thu, 16 May 2002, Paul von Hippel wrote: > > > Here's a missing-data situation that I haven't run into before. X is > one of > > the variables that affects Y, > > Y = a0 + X1a1 + ... + error > > but Y is one of the variables that affects whether I have information on X, > > Pr (X missing) = F ( b0 + Yb1 + ...) > > Here F is a cumulative distribution function -- for example, normal or > > logistic. > > > > I want to make efficient, unbiased estimates of the first equation's > > regression parameters a_i. > > > > Any suggestions most welcome. > > > > Many thanks, > > Paul von Hippel > > Ohio State University