Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick)
Dear All I wanted to thank everyone for their helpful comments. With your help, and that of Simon Wood, I now realise that the reason I have low predicted values is because I have so many zeros in my data. As the model structure I have constructed specifies that the mean must always be positive then the model over-predicts the zero counts and in order not to predict more counts that there actually are it under-estimates the non zeros counts (this underestimation can be quite large due to the high number of zeros). So one thing I am thinking of is to try a zero-inflated model. I have looked at the COZIGAM package but you do not seem to be able use an offset with it. I was wondering if anybody knows of a package where weighted zero-inflated GAM models with an offset can be run. Many thanks, Anna Dr Anna R. Renwick Research Ecologist British Trust for Ornithology, The Nunnery, Thetford, Norfolk, IP24 2PU, UK Tel: +44 (0)1842 750050; Fax: +44 (0)1842 750030 -Original Message- From: r-sig-ecology-boun...@r-project.org [mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Highland Statistics Ltd. Sent: 12 December 2009 11:28 To: r-sig-ecology@r-project.org Subject: Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick) -- Message: 1 Date: Fri, 11 Dec 2009 11:43:40 - From: Anna Renwick anna.renw...@bto.org Subject: [R-sig-eco] low predicted vales in GAMs To: r-sig-ecology@r-project.org Message-ID: bfd6df2c5ca142c58c272652fa017...@btodomain.bto.org Content-Type: text/plain Dear All I have come across a problem with the GAM models I am running. Basically the predicted values are consistently only about 0.4 of the actual values. A bit more detail: MODEL: m4-gam(count~s(east,north,k=10)+ez+cv01+cv03+cv04+cv05+cv07+mtemp+mtotalrai n+ez:mtemp+ez:mtotalrain+ offset(log(fit.vec)), weights=wt, data=spat6, family=quasipoisson, start=rep(0,26) ) MODEL SUMMARY: Family: quasipoisson Link function: log Formula: count ~ s(east, north, k = 10) + ez + cv01 + cv03 + cv04 + cv05 + cv07 + mtemp + mtotalrain + ez:mtemp + ez:mtotalrain + offset(log(fit.vec)) Parametric coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)-5.296e+00 1.846e+00-2.869 0.004166 ** ezM 1.651e+00 2.102e+00 0.785 0.432397 ezP 7.358e+00 2.047e+00 3.595 0.000332 *** ezU-1.061e+02 1.064e+07 -9.97e-06 0.92 cv017.405e-02 5.437e-0313.620 2e-16 *** cv032.258e-02 5.145e-03 4.389 1.20e-05 *** cv042.878e-02 4.839e-03 5.949 3.18e-09 *** cv053.634e-02 5.326e-03 6.823 1.17e-11 *** cv072.370e-02 5.712e-03 4.149 3.48e-05 *** mtemp -1.838e-01 1.750e-01-1.050 0.293900 mtotalrain 1.872e-02 5.072e-03 3.692 0.000229 *** ezM:mtemp 6.181e-02 2.204e-01 0.280 0.779197 ezP:mtemp -7.028e-01 2.050e-01-3.429 0.000619 *** ezU:mtemp 8.697e-01 1.371e+06 6.34e-07 0.99 ezM:mtotalrain -3.393e-02 5.799e-03-5.851 5.68e-09 *** ezP:mtotalrain -1.901e-02 5.379e-03-3.535 0.000417 *** ezU:mtotalrain 3.510e-02 4.074e+04 8.62e-07 0.99 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Ref.df F p-value s(east,north) 8.736 8.736 28.88 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = 0.324 Deviance explained = -5.12e+03% GCV score = 39.556 Scale est. = 39.056n = 2038 Count = bird counts/square Is this really an integer? ez=environmental zone cv = habitat types mtemp = mean annual temperature mtotalrain= mean total rain/year Sample size is approximately 2000. The offset fit.vec is bird detectability and the weighting is based on the number of squares in each area surveyed. I belief that the strange deviance explained is due to the weighting we have added into the model. Why would you use a weighting factor in a Poisson/quasi-Poisson GLM/GAM? See also the weights text for the help file for glm. Not sure what it would be doing. I would have assumed that the predicted values divided by the real counts should be around 1, however they are much lower and hence the model is consistently predicting lower counts than were observed. I was wondering if there is anything obvious which I am missing when carrying out these models. you seem to have a very large overdispersion. But that is another problem. I think your number of squares should actually be used in the offset (the log obviously). Alain Many thanks, Anna Dr Anna R. Renwick
Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick)
-- Message: 1 Date: Fri, 11 Dec 2009 11:43:40 - From: Anna Renwick anna.renw...@bto.org Subject: [R-sig-eco] low predicted vales in GAMs To: r-sig-ecology@r-project.org Message-ID: bfd6df2c5ca142c58c272652fa017...@btodomain.bto.org Content-Type: text/plain Dear All I have come across a problem with the GAM models I am running. Basically the predicted values are consistently only about 0.4 of the actual values. A bit more detail: MODEL: m4-gam(count~s(east,north,k=10)+ez+cv01+cv03+cv04+cv05+cv07+mtemp+mtotalrai n+ez:mtemp+ez:mtotalrain+ offset(log(fit.vec)), weights=wt, data=spat6, family=quasipoisson, start=rep(0,26) ) MODEL SUMMARY: Family: quasipoisson Link function: log Formula: count ~ s(east, north, k = 10) + ez + cv01 + cv03 + cv04 + cv05 + cv07 + mtemp + mtotalrain + ez:mtemp + ez:mtotalrain + offset(log(fit.vec)) Parametric coefficients: Estimate Std. Error t value Pr(|t|) (Intercept)-5.296e+00 1.846e+00-2.869 0.004166 ** ezM 1.651e+00 2.102e+00 0.785 0.432397 ezP 7.358e+00 2.047e+00 3.595 0.000332 *** ezU-1.061e+02 1.064e+07 -9.97e-06 0.92 cv017.405e-02 5.437e-0313.620 2e-16 *** cv032.258e-02 5.145e-03 4.389 1.20e-05 *** cv042.878e-02 4.839e-03 5.949 3.18e-09 *** cv053.634e-02 5.326e-03 6.823 1.17e-11 *** cv072.370e-02 5.712e-03 4.149 3.48e-05 *** mtemp -1.838e-01 1.750e-01-1.050 0.293900 mtotalrain 1.872e-02 5.072e-03 3.692 0.000229 *** ezM:mtemp 6.181e-02 2.204e-01 0.280 0.779197 ezP:mtemp -7.028e-01 2.050e-01-3.429 0.000619 *** ezU:mtemp 8.697e-01 1.371e+06 6.34e-07 0.99 ezM:mtotalrain -3.393e-02 5.799e-03-5.851 5.68e-09 *** ezP:mtotalrain -1.901e-02 5.379e-03-3.535 0.000417 *** ezU:mtotalrain 3.510e-02 4.074e+04 8.62e-07 0.99 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Approximate significance of smooth terms: edf Ref.df F p-value s(east,north) 8.736 8.736 28.88 2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 R-sq.(adj) = 0.324 Deviance explained = -5.12e+03% GCV score = 39.556 Scale est. = 39.056n = 2038 Count = bird counts/square Is this really an integer? ez=environmental zone cv = habitat types mtemp = mean annual temperature mtotalrain= mean total rain/year Sample size is approximately 2000. The offset fit.vec is bird detectability and the weighting is based on the number of squares in each area surveyed. I belief that the strange deviance explained is due to the weighting we have added into the model. Why would you use a weighting factor in a Poisson/quasi-Poisson GLM/GAM? See also the weights text for the help file for glm. Not sure what it would be doing. I would have assumed that the predicted values divided by the real counts should be around 1, however they are much lower and hence the model is consistently predicting lower counts than were observed. I was wondering if there is anything obvious which I am missing when carrying out these models. you seem to have a very large overdispersion. But that is another problem. I think your number of squares should actually be used in the offset (the log obviously). Alain Many thanks, Anna Dr Anna R. Renwick Research Ecologist British Trust for Ornithology, The Nunnery, Thetford, Norfolk, IP24 2PU, UK Tel: +44 (0)1842 750050; Fax: +44 (0)1842 750030 [[alternative HTML version deleted]] -- ___ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology End of R-sig-ecology Digest, Vol 21, Issue 12 * -- Dr. Alain F. Zuur First author of: 1. Analysing Ecological Data (2007). Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p. URL: www.springer.com/0-387-45967-7 2. Mixed effects models and extensions in ecology with R. (2009). Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer. http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9 3. A Beginner's Guide to R (2009). Zuur, AF, Ieno, EN, Meesters, EHWG. Springer http://www.springer.com/statistics/computational/book/978-0-387-93836-3 Other books: http://www.highstat.com/books.htm Statistical consultancy, courses, data analysis and software Highland Statistics Ltd. 6 Laverock road UK - AB41 6FN Newburgh Tel: 0044 1358 788177 Email: highs...@highstat.com URL: www.highstat.com URL: www.brodgar.com