Thanks for the clarification. I actually had MASS open to that page while I was composing my reply but forgot to mention it (trying to do too many things at once) ...
Ben Bolker Prof Brian Ripley wrote: > On Tue, 17 Feb 2009, Ben Bolker wrote: > >> Jessica L Hite/hitejl/O/VCU <hitejl <at> vcu.edu> writes: >> >>> I am attempting to run a glm with a binomial model to analyze proportion >>> data. >>> I have been following Crawley's book closely and am wondering if there is >>> an accepted standard for how much is too much overdispersion? (e.g. change >>> in AIC has an accepted standard of 2). > >> In principle, in the null case (i.e. data are really binomial) >> the deviance is chi-squared distributed with the df equal >> to the residual df. > > *Approximately*, provided the expected counts are not near or below > one. See MASS ยง7.5 for an analysis of the size of the approximation > errors (which can be large and in both directions). > > Given that I once had a consulting job where the over-dispersion was > causing something close ot panic and was entirely illusory, the lack > of the 'approximately' can have quite serious consequences. > >> For example: >> >> example(glm) >> deviance(glm.D93) ## 5.13 >> summary(glm.D93)$dispersion ## 1 (by definition) >> dfr <- df.residual(glm.D93) >> deviance(glm.D93)/dfr ## 1.28 >> d2 <- sum(residuals(glm.D93,"pearson")^2) ## 5.17 >> (disp2 <- d2/dfr) ## 1.293 >> >> gg2 <- update(glm.D93,family=quasipoisson) >> summary(gg2)$dispersion ## 1.293, same as above >> >> pchisq(d2,df=dfr,lower.tail=FALSE) >> >> all.equal(coef(glm.D93),coef(gg2)) ## TRUE >> >> se1 <- coef(summary(glm.D93))[,"Std. Error"] >> se2 <- coef(summary(gg2))[,"Std. Error"] >> se2/se1 >> >> # (Intercept) outcome2 outcome3 treatment2 treatment3 >> # 1.137234 1.137234 1.137234 1.137234 1.137234 >> >> sqrt(disp2) >> # [1] 1.137234 >> >>> My code and output are below, given the example in the book, these data are >>> WAY overdispersed .....do I mention this and go on or does this signal the >>> need to try a different model? If so, any suggestions on the type of >>> distribution (gamma or negative binomial ?)? >> Way overdispersed may indicate model lack of fit. Have >> you examined residuals/data for outliers etc.? >> >> quasibinomial should be fine, or you can try beta-binomial >> (see the aod package) ... >> >> >>> attach(Clutch2) >>> y<-cbind(Total,Size-Total) >>> glm1<-glm(y~Pred,"binomial") >>> summary(glm1) >>> >>> Call: >>> glm(formula = y ~ Pred, family = "binomial") >>> >>> Deviance Residuals: >>> Min 1Q Median 3Q Max >>> -9.1022 -2.7899 -0.4781 2.6058 8.4852 >>> >>> Coefficients: >>> Estimate Std. Error z value Pr(>|z|) >>> (Intercept) 1.35095 0.06612 20.433 < 2e-16 *** >>> PredF -0.34811 0.11719 -2.970 0.00297 ** >>> PredSN -3.29156 0.10691 -30.788 < 2e-16 *** >>> PredW -1.46451 0.12787 -11.453 < 2e-16 *** >>> PredWF -0.56412 0.13178 -4.281 1.86e-05 *** >>> --- >>> #### the output for residual deviance and df does not change even when I >>> use quasibinomial, is this ok? ##### >> That's as expected. >> >>> library(MASS) >> you don't really need MASS for quasibinomial. >> >>>> glm2<-glm(y~Pred,"quasibinomial") >>>> summary(glm2) >>> Call: >>> glm(formula = y ~ Pred, family = "quasibinomial") >>> >>> Deviance Residuals: >>> Min 1Q Median 3Q Max >>> -9.1022 -2.7899 -0.4781 2.6058 8.4852 >>> >>> Coefficients: >>> Estimate Std. Error t value Pr(>|t|) >>> (Intercept) 1.3510 0.2398 5.633 1.52e-07 *** >>> PredF -0.3481 0.4251 -0.819 0.41471 >>> PredSN -3.2916 0.3878 -8.488 1.56e-13 *** >>> PredW -1.4645 0.4638 -3.157 0.00208 ** >>> PredWF -0.5641 0.4780 -1.180 0.24063 >>> --- >>> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >>> >>> (Dispersion parameter for quasibinomial family taken to be 13.15786) >>> >>> Null deviance: 2815.5 on 108 degrees of freedom >>> Residual deviance: 1323.5 on 104 degrees of freedom >>> (3 observations deleted due to missingness) >>> AIC: NA >>> >>> Number of Fisher Scoring iterations: 5 >>> >>>> anova(glm2,test="F") >>> Analysis of Deviance Table >>> >>> Model: quasibinomial, link: logit >>> >>> Response: y >>> >>> Terms added sequentially (first to last) >>> >>> Df Deviance Resid. Df Resid. Dev F Pr(>F) >>> NULL 108 2815.5 >>> Pred 4 1492.0 104 1323.5 28.349 6.28e-16 *** >>> --- >>> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >>>> model1<-update(glm2,~.-Pred) >>>> anova(glm2,model1,test="F") >>> Analysis of Deviance Table >>> >>> Model 1: y ~ Pred >>> Model 2: y ~ 1 >>> Resid. Df Resid. Dev Df Deviance F Pr(>F) >>> 1 104 1323.5 >>> 2 108 2815.5 -4 -1492.0 28.349 6.28e-16 *** >>> --- >>> Signif. codes: 0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1 >>>> coef(glm2) >>> (Intercept) PredF PredSN PredW PredWF >>> 1.3509550 -0.3481096 -3.2915601 -1.4645097 -0.5641223 >>> >>> Thanks >>> Jessica >>> hitejl <at> vcu.edu > > -- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bol...@ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.