Re: [R] logistic regression in an incomplete dataset

Desmond Campbell Tue, 06 Apr 2010 04:21:51 -0700

Dear Thomas,

Thanks for your reply.

Yes you are quite right (your example) complete case does not requireMCAR, however as well as being a bit less robust than ML it is throwingaway data.


Missing Data in Clinical Studies, Geert Molenberghs, Michael Kenward,

have a nice section in chapter 3 or 4 where they rubbish Complete Caseand Last Case Carried Forward.

Ah well, I don't have time to do anything clever so I'm just going to goalong with the complete case logistic regression.


regards
Desmond


Thomas Lumley wrote:

On Mon, 5 Apr 2010, Desmond D Campbell wrote:

Dear Emmanuel,

Thank you.

Yes I broadly agree with what you say.
I think ML is a better strategy than complete case, because I think its
estimates will be more robust than complete case.
For unbiased estimates I think
 ML requires the data is MAR,
 complete case requires the data is MCAR

Anyway I would have thought ML could be done without resorting toMultiple

Imputation, but I'm at the edge of my knowledge here.

This is an illustration of why Rubin's hierarchy, while useful,doesn't displace actual thinking about the problem.

The maximum-likelihood problem for which the MAR assumption issufficient involves specifying the joint likelihood for the outcomeand all predictor variables, which is basically the same problem asmultiple imputation. Multiple imputation averages the estimate overthe distribution of the unknown values; maximum likelihood integratesout the unknown values, but for reasonably large sample sizes theestimates will be equivalent (by asymptotic linearity of theestimator). Standard error calculation is probably easier withmultiple imputation.

Also, it is certainly not true that a complete-case regressionanalysis requires MCAR. For example, if the missingness isindependent of Y given X, the complete-case distribution will have thesame mean of Y given X as the population and so will have the samebest-fitting regression. This is a stronger assumption than you needfor multiple imputation, but not a lot stronger.


        -thomas

Thanks once again,

regards
Desmond


From: Emmanuel Charpentier <charpent <at> bacbuc.dyndns.org>
Subject: Re: logistic regression in an incomplete dataset
Newsgroups: gmane.comp.lang.r.general
Date: 2010-04-05 19:58:20 GMT (2 hours and 10 minutes ago)

Dear Desmond,

a somewhat analogous question has been posed recently (about 2 weeks
ago) on the sig-mixed-model list, and I tried (in two posts) to give
some elements of information (and some bibliographic pointers). To
summarize tersely :

- a model of "information missingness" (i. e. *why* are some data
missing ?) is necessary to choose the right measures to take. Two
special cases (Missing At Random and Missing Completely At Random) allow
for (semi-)automated compensation. See literature for further details.

- complete-case analysis may give seriously weakened and *biased*
results. Pairwise-complete-case analysis is usually *worse*.

- simple imputation leads to underestimated variances and might also
give biased results.

- multiple imputation is currently thought of a good way to alleviate
missing data if you have a missingness model (or can honestly bet on
MCAR or MAR), and if you properly combine the results of your
imputations.

- A few missing data packages exist in R to handle this case. My ersonal
selection at this point would be mice, mi, Amelia, and possibly mitools,
but none of them is fully satisfying(n particular, accounting for a
random effect needs special handling all the way in all packages...).

- An interesting alternative is to write a full probability model (in
BUGS fo example) and use Bayesian estimation ; in this framework,
missing data are "naturally" modeled in the model used for analysis.
However, this might entail *large* work, be difficult and not always
succeed (numerical difficulties. Furthermore, the results of a Byesian
analysis might not be what you seek...

HTH,

                    Emmanuel Charpentier

Le lundi 05 avril 2010 à 11:34 +0100, Desmond Campbell a écrit :

Dear all,

I want to do a logistic regression.
So far I've only found out how to do that in R, in a dataset ofcomplete

cases.

I'd like to do logistic regression via max likelihood, using all the

study cases (complete and
incomplete). Can you help?


I'm using glm() with family=binomial(logit).
If any covariate in a study case is missing then the study case is

dropped, i.e. it is doing a complete cases analysis.

As a lot of study cases are being dropped, I'd rather it did maximum

likelihood using all the study cases.

I tried setting glm()'s na.action to NULL, but then it complained about

NA's present in the study cases.

I've about 1000 unmatched study cases and less than 10 covariates so

could use unconditional ML
estimation (as opposed to conditional ML estimation).


regards
Desmond


--
Desmond Campbell
UCL Genetics Institute
d.campb...@ucl.ac.uk
Tel. ext. 020 31084006, int. 54006


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help

PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley            Assoc. Professor, Biostatistics
tlum...@u.washington.edu    University of Washington, Seattle



--
Desmond Campbell
UCL Genetics Institute
d.campb...@ucl.ac.uk
Tel. ext. 020 31084006, int. 54006

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression in an incomplete dataset

Reply via email to