Dear Adriaan

 

I don't know the exact workings of ice nor mi impute mvn, but I guess
they use predictive mean matching as a default procedure for numeric
variables (I know mice from R/Splus does). PMM does not work nicely when
you chop of part of the distribution since there is no way to get those
low values back. When I tried this in R (using mice) using a regression
imputation I got unbiased results. You might want to try that  in Stata
also.

 

All best,

Rogier Donders

 

 

_______________________________________________________

Rogier Donders, PhD, Biostatistician

Department of Epidemiology, Biostatistics and HTA, EBH 133

Radboud University Nijmegen Medical Centre

P.O. Box 9101

6500 HB Nijmegen

The Netherlands

Tel.: +31 (0)243617794

 

To visit please follow route 138!

_______________________________________________________

 

Department of Epidemiology, Biostatistics and HTA

research with impact

 

 

 

Van: Impute -- Imputations in Data Analysis
[mailto:[email protected]] Namens Hoogendoorn, Adriaan
Verzonden: woensdag 18 november 2009 11:56
Aan: [email protected]
Onderwerp: Mulitple Imputation in a very simple situation: just two
variables

 

Dear Listserv,

 

I would like to know in which situations the Multiple Imputation method
works well when I have just two variables 

 

I did the following simulation study: I generated (X,Y) being 100 draws
from the bivariate normal distribution with standard normal margins and
a correlation coefficient of .7. 

Next I created missings in four different ways, creating

 

1. missing X's, depending on the value of Y (MAR)

2. missing Y's, depending on the value of X (MAR)

3. missing Y's, depending on the value of Y (MNAR)

4. missing X's, depending on the value of X (MNAR)

 

Here, I was motivated by the (in my view very nice) blood pressure
example that Schafer and Graham (2002) use to illustrate differences
between MCAR, MAR and NMAR. 

As far as I understood, the first two missing data mechanisms are MAR
and the latter two are MNAR. 

As Schafer and Graham did, I used a very rigorous method in creating
missing values by chopping off a part of the bivariate normal
distribution. 

In more detail: I created missing values if X (or Y) had a value below
0.5. This resulted in about 70% missing values, wihich could be expected
from the standard normal distribution. 

 

Note that for the COMPLETE DATA, the scatter diagrams of 1 and 3 are
identical and show the top slice of the bivariate normal distribution. 

Also the scatter diagrams 2 and 4 are identical and show the right-end
slice of the bivariate normal distribution. 

The scatter diagrams suggest that regressing y on x using complete case
analysis will fail in cases 1 and 3: the top slice of the bivariate
normal tilts the regression line and results in a biased regression
coefficient estimate. The scatter diagrams also suggest that regressing
y on x using complete case analysis may work well in cases 2 and 4,
where missingness depends on X. 

These suggestions were confirmed by the simulation study: 

The mean regression coefficient (over 2000 simulations) came out to be
.29, showing a serious bias from the true value of .7, in case 1 and 3,
i.e. when missingness depends on Y. 

Case 1 illustrates Allisons' claim that ".. if the data are not MCAR,
but only MAR, listwise deletion can yield biased estimates." (see
Allison (2001) page 6)

When missingness depends on X, the mean regression coefficient came out
to be .70 and is unbiased. Again this confirms one of Allison's claims:
"... if the probability of missing data on any of the independent
variables does noet depend on the values fo the dependent variable, then
regression estimates using listwise deletion will be unbiased ... (see
Allison (2001) page 6-7)

 

 

Now comes the interesting part where I using Mulitiple Imputation
(detail: I used Stata's "ice"procedure and was able to replicate the
results using Stata's "mi impute mvn") 

I found the following results

 

1. b = .59 (averaged over 2000 simulations)

2. b = .70

3. b = .29

4. b = .89

 

My point is: Case 1 shows a bias!

Althoug substantially smaller than complete case analysis (where b =
.29), I still obtain a bias of .11. 

I would have expected, since case 1 is a case of MAR, that Multiple
Imputaion would provide an unbiased estimate. 

 

Do you have any clues why this happens? 

 

I modified the simulation study by replacing the cut-off value in the
missing data mechanism by a stochastic selection mechanism depending on
X (or Y) but found similar results. 

 

Kind regards,

Adriaan Hoogendoorn

GGZ inGeest

Amstedam

 

 

Reference: 

Schafer, J.L. & J.W. Graham (2002), Missing Data: Our View of the State
of the Art, Psychological Methods, 147-177

Allisson, P.D. (2001), Missing Data, Sage Pub., Thousand Oaks, CA.




Het UMC St Radboud staat geregistreerd bij de Kamer van Koophandel in het 
handelsregister onder nummer 41055629.
The Radboud University Nijmegen Medical Centre is listed in the Commercial 
Register of the Chamber of Commerce under file number 41055629.

Reply via email to