Paul,
        If I'm interpreting your question correctly, you have a problem of a
non-random pattern of missing data that doesn't fulfill the assumptions of
either data missing completely at random (MCAR) or missing at random (MAR).
I studied this same problem using the Markov chain Monte Carlo method of
multiple imputation in SAS Proc MI with a larger dataset (n=10,319) and 20
variables in the imputation process.  Whether I created a missing variable
MCAR or by a predefined non-random selection (i.e. missing data dependent on
another variable) the imputation process was robust and changed almost not
at all.  I compared point estimates and 95%CI from a regression analysis
after imputation (proc MIANALYZE) for the manipulated variable after
different percentages (10-90%) of data MCAR and data missing in a non-random
pattern.  For the same % missing data (MCAR or non-random pattern) the
results were almost identical.  These findings may be dependent on a larger
sample size and having enough variables predictive of those you want to
impute.  Hope this helps.

Craig

Craig D. Newgard, MD, MPH
Research Fellow
Department of Emergency Medicine
Harbor-UCLA Medical Center
1000 West Carson Street, Box 21
Torrance, CA 90509
(310)222-3666 (Office)
(310)782-1763 (Fax)
[EMAIL PROTECTED]


-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]]On
Behalf Of Paul von Hippel
Sent: Thursday, May 16, 2002 11:08 AM
To: [EMAIL PROTECTED]
Subject: IMPUTE: Re: Y affects probability of observing X


Actually, a more complete description of the situation would be this:
  X is one of  the variables that affects Y,
         Y = a0 + Xa1 + ... + error
  but Y is one of the variables that affects an indicator Z,
         Pr (Z=1) = F ( b0 + Yb1 + ...)
  and if Z=1 then I definitely can't observe X.

I suspect the correct procedure in this situation is the same as in the
simplified situation I described originally.
But I thought I should be thorough.

Thanks again,
Paul von Hippel

>On Thu, 16 May 2002, Paul von Hippel wrote:
>
> > Here's a missing-data situation that I haven't run into before. X is
> one of
> > the variables that affects Y,
> >       Y = a0 + X1a1 + ... + error
> > but Y is one of the variables that affects whether I have information on
X,
> >       Pr (X missing) = F ( b0 + Yb1 + ...)
> > Here F is a cumulative distribution function -- for example, normal or
> > logistic.
> >
> > I want to make efficient, unbiased estimates of the first equation's
> > regression parameters a_i.
> >
> > Any suggestions most welcome.
> >
> > Many thanks,
> > Paul von Hippel
> > Ohio State University




Reply via email to