subject:"\[R\] overdispersion"

[R] Overdispersion in a GLM binomial model

2007-02-25 Thread Serguei Kaniovski


Hello,

The share of concurring votes (i.e. yes-yes and no-no) in total votes
between a pair of voters is a function of their ideological distance (index
continuous on [1,2]).

I show by other means that the votes typically are highly positively
correlated (with an average c=0.6). This is because voters sit together and
discuss the issue before taking a vote, but also because they share common
ideologies.

The coefficient is significant; sign correct; fit is good:
R-sq.(adj)=0.866.
BUT there seems to be a massive overdispersion:
Deviance explained=39.3%,
Residual deviance: 3874.0 on 102 degrees of freedom.
AND the residual-fitted plot shows heteroscedasticity.

The overdispersion cannot be remedied by regressing on LOG(index), or by
using the quasibinomial family with a scale parameter for the variance. The
estimated Dispersion parameter for quasibinomial family is large 37.34917.

QUESTION: Is there an overdispersion? Can overdispersion be due to
correlation between the votes? What can be done? The data is attached
below, v1 concurring votes, v0 dissenting votes, idist the index.

Thanks,
Serguei

DATA:
v1,v2,idist
376,40,1.125
328,88,1.375
367,49,1.145
372,44,1.125
273,143,1
325,91,1.125
375,41,1.125
357,59,1.375
751,359,1.885
816,294,1
752,358,1.885
829,281,1.3
857,253,1.05
759,351,1.07
848,262,1.135
803,307,1.385
555,555,1.885
346,70,1.5
381,35,1.27
398,18,1.25
289,127,1.125
1003,107,1
580,530,1.585
628,482,1.835
502,608,1.955
745,365,1.75
343,73,1.25
407,9,1.25
373,43,1.5
587,96,1.205
507,176,1.11
528,155,1.06
473,210,1.43
436,247,1.475
585,98,1.145
541,142,1.225
425,258,1.315
540,570,1.885
975,135,1.3
959,151,1.05
973,137,1.07
772,338,1.135
879,231,1.385
327,89,1.23
332,84,1.25
331,85,1.375
339,77,1.25
345,71,1.25
353,63,1
373,43,1.02
266,150,1.145
318,98,1.02
384,32,1.02
346,70,1.23
519,164,1.315
512,171,1.265
481,202,1.635
446,237,1.68
613,70,1.35
553,130,1.43
435,248,1.52
291,125,1.125
345,71,1
397,19,1
357,59,1.25
338,78,1.125
286,130,1.125
326,90,1.375
588,95,1.05
597,86,1.32
564,119,1.365
537,146,1.035
445,238,1.115
559,124,1.205
565,545,1.585
613,497,1.835
485,625,1.955
736,374,1.75
583,527,1.5
954,156,1.25
972,138,1.37
557,126,1.415
540,143,1.085
811,299,1.165
560,123,1.255
846,264,1.085
928,182,1.12
819,291,1.085
872,238,1.335
602,81,1.045
497,186,1.285
745,365,1.205
599,84,1.115
834,276,1.455
468,215,1.33
360,323,1.25
640,43,1.16
541,142,1.08
461,222,1.17
355,328,1.09
729,381,1.25
338,78,1
354,62,1.25
366,50,1.25
[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overdispersion

2007-01-12 Thread John Maindonald

I would say rather that for binary data (binomial data with n=1) it  
is not possible to detect overdispersion from examination of the  
Pearson chi-square or the deviance.   Overdispersion may be, and  
often is, nevertheless present.  I am arguing that overdispersion is  
properly regarded as a function of the variance-covariance structure,  
not as a function of the sample data.

The variance of a two-point distribution is a known function of the  
mean, providing that independence and identity of distribution can be  
assumed, or providing that the correlation structure is otherwise  
known and the mean is constant. That proviso is crucial!

If there is some sort of grouping, it may be appropriate to aggregate  
data over the groups, yielding data that have a binomial form with  
n1.  Over-dispersion can now be detected from the Pearson chi-square  
or from the deviance.  Note that the quasi models assume that the  
multiplier for the binomial or other variance is constant with p;  
that may or may not be realistic.  Generalized linear mixed models  
make their own different assumptions about how the variance changes  
as a function of p; again these may or may not be realistic.

It is then the error structure that is crucial. To the extent that  
distracts from careful thinking about that structure, the term  
overdispersion is unsatisfactory.

There's no obvious way that I can see to supply glm() with an  
estimate of the dispersion that has been derived independently of the  
current analysis.  Especially in the binary case, this would  
sometimes be useful.

John Maindonald email: [EMAIL PROTECTED]
phone : +61 2 (6125)3473fax  : +61 2(6125)5549
Centre for Mathematics  Its Applications, Room 1194,
John Dedman Mathematical Sciences Building (Building 27)
Australian National University, Canberra ACT 0200.


On 12 Jan 2007, at 10:00 PM, [EMAIL PROTECTED] wrote:

 From: Peter Dalgaard [EMAIL PROTECTED]
 Date: 12 January 2007 5:04:26 AM
 To: evaiannario [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch r-help@stat.math.ethz.ch
 Subject: Re: [R] overdispersion


 evaiannario wrote:
 How can I eliminate the overdispersion for binary data apart the  
 use of the quasibinomial?
 There is no such thing as overdispersion for binary data. (The  
 variance of a two-point distribution is a known function of the  
 mean.) If what you want to do is include random effects of some  
 sort of grouping then you might look into generalized linear mixed  
 models via lmer() from the lme4 package or glmmPQL from MASS.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overdispersion

2007-01-12 Thread Peter Dalgaard

John Maindonald wrote:
 I would say rather that for binary data (binomial data with n=1) it  
 is not possible to detect overdispersion from examination of the  
 Pearson chi-square or the deviance.   Overdispersion may be, and  
 often is, nevertheless present.  I am arguing that overdispersion is  
 properly regarded as a function of the variance-covariance structure,  
 not as a function of the sample data.

 The variance of a two-point distribution is a known function of the  
 mean, providing that independence and identity of distribution can be  
 assumed, or providing that the correlation structure is otherwise  
 known and the mean is constant. That proviso is crucial!
   
I don't really disagree, of course. I was mainly being provocative.

However, these models play tricks on our intuition. When people speak of 
overdispersion, they usually imply just what you said: independent data 
with the correct mean, but somehow a different variance - a mathematical 
impossibility for binary data.

One particular thing to notice is that if the individual means are 
heterogeneous but sampled independently from the same underlying 
distribution; you still end up with a marginal binomial distribution. If 
they are not sampled independently, then you get departures from the 
binomial, but it may well be in the direction of underdispersion. For an 
extreme case, take a sample of 50 men and 50 women and count the number 
of people with breasts.

(If you do the same thing with a random sample of 100 _people_, you get 
the binomial distribution again. Unless you're counting the number of 
breasts...)
 If there is some sort of grouping, it may be appropriate to aggregate  
 data over the groups, yielding data that have a binomial form with  
 n1.  Over-dispersion can now be detected from the Pearson chi-square  
 or from the deviance.  Note that the quasi models assume that the  
 multiplier for the binomial or other variance is constant with p;  
 that may or may not be realistic.  Generalized linear mixed models  
 make their own different assumptions about how the variance changes  
 as a function of p; again these may or may not be realistic.

 It is then the error structure that is crucial. To the extent that  
 distracts from careful thinking about that structure, the term  
 overdispersion is unsatisfactory.

 There's no obvious way that I can see to supply glm() with an  
 estimate of the dispersion that has been derived independently of the  
 current analysis.  Especially in the binary case, this would  
 sometimes be useful.

 John Maindonald email: [EMAIL PROTECTED]
 phone : +61 2 (6125)3473fax  : +61 2(6125)5549
 Centre for Mathematics  Its Applications, Room 1194,
 John Dedman Mathematical Sciences Building (Building 27)
 Australian National University, Canberra ACT 0200.


 On 12 Jan 2007, at 10:00 PM, [EMAIL PROTECTED] wrote:

   
 From: Peter Dalgaard [EMAIL PROTECTED]
 Date: 12 January 2007 5:04:26 AM
 To: evaiannario [EMAIL PROTECTED]
 Cc: r-help@stat.math.ethz.ch r-help@stat.math.ethz.ch
 Subject: Re: [R] overdispersion


 evaiannario wrote:
 
 How can I eliminate the overdispersion for binary data apart the  
 use of the quasibinomial?
   
 There is no such thing as overdispersion for binary data. (The  
 variance of a two-point distribution is a known function of the  
 mean.) If what you want to do is include random effects of some  
 sort of grouping then you might look into generalized linear mixed  
 models via lmer() from the lme4 package or glmmPQL from MASS.
 

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] overdispersion

2007-01-11 Thread evaiannario

How can I eliminate the overdispersion for binary data apart the use of the 
quasibinomial? 
help me 
Eva Iannario



--
Passa a Infostrada. ADSL e Telefono senza limiti e senza canone Telecom
http://click.libero.it/infostrada11gen07

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] overdispersion

2007-01-11 Thread Peter Dalgaard

evaiannario wrote:
 How can I eliminate the overdispersion for binary data apart the use of the 
 quasibinomial?
There is no such thing as overdispersion for binary data. (The variance 
of a two-point distribution is a known function of the mean.) If what 
you want to do is include random effects of some sort of grouping then 
you might look into generalized linear mixed models via lmer() from the 
lme4 package or glmmPQL from MASS.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Overdispersion in a GLM binomial model

Re: [R] overdispersion

Re: [R] overdispersion

[R] overdispersion

Re: [R] overdispersion

5 matches

Site Navigation

Mail list logo

Footer information