Tim Hesterberg, a statistician at Insightful Corporations (makers of S-Plus) 
had given me some excellent notes motivating why Y must be used.  I have put 
these in the Multiple Imputation section of the following web page:

http://hesweb1.med.virginia.edu/biostat/rms

His notes include some S code to demonstrate what he's talking about.

Frank Harrell


On Wed, 22 May 2002 16:17:57 -0400
Constantine Daskalakis <[email protected]> wrote:

> Hi.
> 
> I have a regression of Y on a bunch of Xs (always observed) and on Z 
> (sometimes missing).
> 
> The X's will be used to impute Z. But should Y also be used in imputing Z?
> 
> My reading of the literature suggests that's not a problem and can often be 
> a good thing in terms of gaining precision. A colleague argues that using 
> the outcome to impute the predictor, will bias the estimated effect of that 
> predictor in the main regression model. She argues that, by using Y, 
> "you're stacking the deck, so to speak", ie, the imputation determines what 
> you'll find out in the main regression model.
> 
> Is there a heuristic response to that concern?
> (Or, if I'm wrong, please someone correct me!)
> 
> Thanks,
> cd
> 
> PS  Always assuming MAR of Z (ie, missingness of Z does not depend on the 
> unobserved Z itself).
> 
> 
> 
> ________________________________________________________________
> 
> Constantine Daskalakis, ScD
> Assistant Professor,
> Biostatistics Section, Thomas Jefferson University,
> 125 S. 9th St. #402, Philadelphia, PA 19107
>     Tel: 215-955-5695
>     Fax: 215-503-3804
>     Email: [email protected]
>     Webpage: http://www.kcc.tju.edu/Science/SharedFacilities/Biostatistics
> 
> 


-- 
Frank E Harrell Jr              Prof. of Biostatistics & Statistics
Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat

Reply via email to