Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick)

2009-12-16 Thread Anna Renwick
Dear All

I wanted to thank everyone for their helpful comments. With your help, and
that of Simon Wood, I now realise that the reason I have low predicted
values is because I have so many zeros in my data. As the model structure I
have constructed specifies that the mean must always be positive then the
model over-predicts the zero counts and in order not to predict more counts
that there actually are it under-estimates the non zeros counts (this
underestimation can be quite large due to the high number of zeros).


So one thing I am thinking of is to try a zero-inflated model. I have looked
at the COZIGAM package but you do not seem to be able use an offset with it.
I was wondering if anybody knows of a package where weighted zero-inflated
GAM models with an offset can be run.

Many thanks,

Anna

Dr Anna R. Renwick
Research Ecologist
British Trust for Ornithology, 
The Nunnery, 
Thetford, 
Norfolk, 
IP24 2PU, 
UK
Tel: +44 (0)1842 750050; Fax: +44 (0)1842 750030 

-Original Message-
From: r-sig-ecology-boun...@r-project.org
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Highland
Statistics Ltd.
Sent: 12 December 2009 11:28
To: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick)


 --

 Message: 1
 Date: Fri, 11 Dec 2009 11:43:40 -
 From: Anna Renwick anna.renw...@bto.org
 Subject: [R-sig-eco] low predicted vales in GAMs
 To: r-sig-ecology@r-project.org
 Message-ID: bfd6df2c5ca142c58c272652fa017...@btodomain.bto.org
 Content-Type: text/plain

 Dear All

  

 I have come across a problem with the GAM models I am running. Basically
the
 predicted values are consistently only about 0.4 of the actual values. 

  

 A bit more detail:

 MODEL:


m4-gam(count~s(east,north,k=10)+ez+cv01+cv03+cv04+cv05+cv07+mtemp+mtotalrai
 n+ez:mtemp+ez:mtotalrain+

 offset(log(fit.vec)),

 weights=wt,

 data=spat6,

 family=quasipoisson,

 start=rep(0,26)

 )

 MODEL SUMMARY:

  

 Family: quasipoisson 

 Link function: log 

  

 Formula:

 count ~ s(east, north, k = 10) + ez + cv01 + cv03 + cv04 + cv05 + 

 cv07 + mtemp + mtotalrain + ez:mtemp + ez:mtotalrain +
 offset(log(fit.vec))

  

 Parametric coefficients:

  Estimate Std. Error   t value Pr(|t|)

 (Intercept)-5.296e+00  1.846e+00-2.869 0.004166 ** 

 ezM 1.651e+00  2.102e+00 0.785 0.432397

 ezP 7.358e+00  2.047e+00 3.595 0.000332 ***

 ezU-1.061e+02  1.064e+07 -9.97e-06 0.92

 cv017.405e-02  5.437e-0313.620   2e-16 ***

 cv032.258e-02  5.145e-03 4.389 1.20e-05 ***

 cv042.878e-02  4.839e-03 5.949 3.18e-09 ***

 cv053.634e-02  5.326e-03 6.823 1.17e-11 ***

 cv072.370e-02  5.712e-03 4.149 3.48e-05 ***

 mtemp  -1.838e-01  1.750e-01-1.050 0.293900

 mtotalrain  1.872e-02  5.072e-03 3.692 0.000229 ***

 ezM:mtemp   6.181e-02  2.204e-01 0.280 0.779197

 ezP:mtemp  -7.028e-01  2.050e-01-3.429 0.000619 ***

 ezU:mtemp   8.697e-01  1.371e+06  6.34e-07 0.99

 ezM:mtotalrain -3.393e-02  5.799e-03-5.851 5.68e-09 ***

 ezP:mtotalrain -1.901e-02  5.379e-03-3.535 0.000417 ***

 ezU:mtotalrain  3.510e-02  4.074e+04  8.62e-07 0.99

 ---

 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

  

 Approximate significance of smooth terms:

 edf Ref.df F p-value

 s(east,north) 8.736  8.736 28.88  2e-16 ***

 ---

 Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

  

 R-sq.(adj) =  0.324   Deviance explained = -5.12e+03%

 GCV score = 39.556  Scale est. = 39.056n = 2038

  

  

 Count = bird counts/square

   

Is this really an integer?


 ez=environmental zone

 cv = habitat types

 mtemp = mean annual temperature

 mtotalrain= mean total rain/year

  

 Sample size is approximately 2000.

  

 The offset fit.vec is bird detectability and the weighting is based on the
 number of squares in each area surveyed. I belief that the strange
deviance
 explained is due to the weighting we have added into the model.

   
Why would you use a weighting factor in a Poisson/quasi-Poisson GLM/GAM? 
See also the weights text for the help file for glm. Not sure what it 
would be doing.

  

 I would have assumed that the predicted values divided by the real counts
 should be around 1, however they are much lower and hence the model is
 consistently predicting lower counts than were observed. I was wondering
if
 there is anything obvious which I am missing when carrying out these
models.

   

you seem to have a very large overdispersion. But that is another 
problem. I think your number of squares should actually be used in the 
offset (the log obviously).

Alain

  

 Many thanks,

 Anna

  

 Dr Anna R. Renwick

Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick)

2009-12-12 Thread Highland Statistics Ltd.



--

Message: 1
Date: Fri, 11 Dec 2009 11:43:40 -
From: Anna Renwick anna.renw...@bto.org
Subject: [R-sig-eco] low predicted vales in GAMs
To: r-sig-ecology@r-project.org
Message-ID: bfd6df2c5ca142c58c272652fa017...@btodomain.bto.org
Content-Type: text/plain

Dear All

 


I have come across a problem with the GAM models I am running. Basically the
predicted values are consistently only about 0.4 of the actual values. 

 


A bit more detail:

MODEL:

m4-gam(count~s(east,north,k=10)+ez+cv01+cv03+cv04+cv05+cv07+mtemp+mtotalrai
n+ez:mtemp+ez:mtotalrain+

offset(log(fit.vec)),

weights=wt,

data=spat6,

family=quasipoisson,

start=rep(0,26)

)

MODEL SUMMARY:

 

Family: quasipoisson 

Link function: log 

 


Formula:

count ~ s(east, north, k = 10) + ez + cv01 + cv03 + cv04 + cv05 + 


cv07 + mtemp + mtotalrain + ez:mtemp + ez:mtotalrain +
offset(log(fit.vec))

 


Parametric coefficients:

 Estimate Std. Error   t value Pr(|t|)

(Intercept)-5.296e+00  1.846e+00-2.869 0.004166 ** 

ezM 1.651e+00  2.102e+00 0.785 0.432397


ezP 7.358e+00  2.047e+00 3.595 0.000332 ***

ezU-1.061e+02  1.064e+07 -9.97e-06 0.92


cv017.405e-02  5.437e-0313.620   2e-16 ***

cv032.258e-02  5.145e-03 4.389 1.20e-05 ***

cv042.878e-02  4.839e-03 5.949 3.18e-09 ***

cv053.634e-02  5.326e-03 6.823 1.17e-11 ***

cv072.370e-02  5.712e-03 4.149 3.48e-05 ***

mtemp  -1.838e-01  1.750e-01-1.050 0.293900


mtotalrain  1.872e-02  5.072e-03 3.692 0.000229 ***

ezM:mtemp   6.181e-02  2.204e-01 0.280 0.779197


ezP:mtemp  -7.028e-01  2.050e-01-3.429 0.000619 ***

ezU:mtemp   8.697e-01  1.371e+06  6.34e-07 0.99


ezM:mtotalrain -3.393e-02  5.799e-03-5.851 5.68e-09 ***

ezP:mtotalrain -1.901e-02  5.379e-03-3.535 0.000417 ***

ezU:mtotalrain  3.510e-02  4.074e+04  8.62e-07 0.99


---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 


Approximate significance of smooth terms:

edf Ref.df F p-value


s(east,north) 8.736  8.736 28.88  2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 


R-sq.(adj) =  0.324   Deviance explained = -5.12e+03%

GCV score = 39.556  Scale est. = 39.056n = 2038

 

 


Count = bird counts/square

  


Is this really an integer?



ez=environmental zone

cv = habitat types

mtemp = mean annual temperature

mtotalrain= mean total rain/year

 


Sample size is approximately 2000.

 


The offset fit.vec is bird detectability and the weighting is based on the
number of squares in each area surveyed. I belief that the strange deviance
explained is due to the weighting we have added into the model.

  
Why would you use a weighting factor in a Poisson/quasi-Poisson GLM/GAM? 
See also the weights text for the help file for glm. Not sure what it 
would be doing.


 


I would have assumed that the predicted values divided by the real counts
should be around 1, however they are much lower and hence the model is
consistently predicting lower counts than were observed. I was wondering if
there is anything obvious which I am missing when carrying out these models.

  


you seem to have a very large overdispersion. But that is another 
problem. I think your number of squares should actually be used in the 
offset (the log obviously).


Alain

 


Many thanks,

Anna

 


Dr Anna R. Renwick
Research Ecologist
British Trust for Ornithology, 
The Nunnery, 
Thetford, 
Norfolk, 
IP24 2PU, 
UK
Tel: +44 (0)1842 750050; Fax: +44 (0)1842 750030 

 



[[alternative HTML version deleted]]



--

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


End of R-sig-ecology Digest, Vol 21, Issue 12
*

  



--


Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9


3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3


Other books: http://www.highstat.com/books.htm


Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highs...@highstat.com
URL: www.highstat.com
URL: www.brodgar.com