John Maindonald wrote:
I would say rather that for binary data (binomial data with n=1) it
is not possible to detect overdispersion from examination of the
Pearson chi-square or the deviance. Overdispersion may be, and
often is, nevertheless present. I am arguing that overdispersion is
properly regarded as a function of the variance-covariance structure,
not as a function of the sample data.
The variance of a two-point distribution is a known function of the
mean, providing that independence and identity of distribution can be
assumed, or providing that the correlation structure is otherwise
known and the mean is constant. That proviso is crucial!
I don't really disagree, of course. I was mainly being provocative.
However, these models play tricks on our intuition. When people speak of
overdispersion, they usually imply just what you said: independent data
with the correct mean, but somehow a different variance - a mathematical
impossibility for binary data.
One particular thing to notice is that if the individual means are
heterogeneous but sampled independently from the same underlying
distribution; you still end up with a marginal binomial distribution. If
they are not sampled independently, then you get departures from the
binomial, but it may well be in the direction of underdispersion. For an
extreme case, take a sample of 50 men and 50 women and count the number
of people with breasts.
(If you do the same thing with a random sample of 100 _people_, you get
the binomial distribution again. Unless you're counting the number of
breasts...)
If there is some sort of grouping, it may be appropriate to aggregate
data over the groups, yielding data that have a binomial form with
n1. Over-dispersion can now be detected from the Pearson chi-square
or from the deviance. Note that the quasi models assume that the
multiplier for the binomial or other variance is constant with p;
that may or may not be realistic. Generalized linear mixed models
make their own different assumptions about how the variance changes
as a function of p; again these may or may not be realistic.
It is then the error structure that is crucial. To the extent that
distracts from careful thinking about that structure, the term
overdispersion is unsatisfactory.
There's no obvious way that I can see to supply glm() with an
estimate of the dispersion that has been derived independently of the
current analysis. Especially in the binary case, this would
sometimes be useful.
John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax : +61 2(6125)5549
Centre for Mathematics Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.
On 12 Jan 2007, at 10:00 PM, [EMAIL PROTECTED] wrote:
From: Peter Dalgaard [EMAIL PROTECTED]
Date: 12 January 2007 5:04:26 AM
To: evaiannario [EMAIL PROTECTED]
Cc: r-help@stat.math.ethz.ch r-help@stat.math.ethz.ch
Subject: Re: [R] overdispersion
evaiannario wrote:
How can I eliminate the overdispersion for binary data apart the
use of the quasibinomial?
There is no such thing as overdispersion for binary data. (The
variance of a two-point distribution is a known function of the
mean.) If what you want to do is include random effects of some
sort of grouping then you might look into generalized linear mixed
models via lmer() from the lme4 package or glmmPQL from MASS.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.