Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick)

2009-12-16 Thread Anna Renwick
Dear All

I wanted to thank everyone for their helpful comments. With your help, and
that of Simon Wood, I now realise that the reason I have low predicted
values is because I have so many zeros in my data. As the model structure I
have constructed specifies that the mean must always be positive then the
model over-predicts the zero counts and in order not to predict more counts
that there actually are it under-estimates the non zeros counts (this
underestimation can be quite large due to the high number of zeros).


So one thing I am thinking of is to try a zero-inflated model. I have looked
at the COZIGAM package but you do not seem to be able use an offset with it.
I was wondering if anybody knows of a package where weighted zero-inflated
GAM models with an offset can be run.

Many thanks,

Anna

Dr Anna R. Renwick
Research Ecologist
British Trust for Ornithology, 
The Nunnery, 
Thetford, 
Norfolk, 
IP24 2PU, 
UK
Tel: +44 (0)1842 750050; Fax: +44 (0)1842 750030 

-Original Message-
From: r-sig-ecology-boun...@r-project.org
[mailto:r-sig-ecology-boun...@r-project.org] On Behalf Of Highland
Statistics Ltd.
Sent: 12 December 2009 11:28
To: r-sig-ecology@r-project.org
Subject: Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick)


> --
>
> Message: 1
> Date: Fri, 11 Dec 2009 11:43:40 -
> From: "Anna Renwick" 
> Subject: [R-sig-eco] low predicted vales in GAMs
> To: 
> Message-ID: 
> Content-Type: text/plain
>
> Dear All
>
>  
>
> I have come across a problem with the GAM models I am running. Basically
the
> predicted values are consistently only about 0.4 of the actual values. 
>
>  
>
> A bit more detail:
>
> MODEL:
>
>
m4<-gam(count~s(east,north,k=10)+ez+cv01+cv03+cv04+cv05+cv07+mtemp+mtotalrai
> n+ez:mtemp+ez:mtotalrain+
>
> offset(log(fit.vec)),
>
> weights=wt,
>
> data=spat6,
>
> family=quasipoisson,
>
> start=rep(0,26)
>
> )
>
> MODEL SUMMARY:
>
>  
>
> Family: quasipoisson 
>
> Link function: log 
>
>  
>
> Formula:
>
> count ~ s(east, north, k = 10) + ez + cv01 + cv03 + cv04 + cv05 + 
>
> cv07 + mtemp + mtotalrain + ez:mtemp + ez:mtotalrain +
> offset(log(fit.vec))
>
>  
>
> Parametric coefficients:
>
>  Estimate Std. Error   t value Pr(>|t|)
>
> (Intercept)-5.296e+00  1.846e+00-2.869 0.004166 ** 
>
> ezM 1.651e+00  2.102e+00 0.785 0.432397
>
> ezP 7.358e+00  2.047e+00 3.595 0.000332 ***
>
> ezU-1.061e+02  1.064e+07 -9.97e-06 0.92
>
> cv017.405e-02  5.437e-0313.620  < 2e-16 ***
>
> cv032.258e-02  5.145e-03 4.389 1.20e-05 ***
>
> cv042.878e-02  4.839e-03 5.949 3.18e-09 ***
>
> cv053.634e-02  5.326e-03 6.823 1.17e-11 ***
>
> cv072.370e-02  5.712e-03 4.149 3.48e-05 ***
>
> mtemp  -1.838e-01  1.750e-01-1.050 0.293900
>
> mtotalrain  1.872e-02  5.072e-03 3.692 0.000229 ***
>
> ezM:mtemp   6.181e-02  2.204e-01 0.280 0.779197
>
> ezP:mtemp  -7.028e-01  2.050e-01-3.429 0.000619 ***
>
> ezU:mtemp   8.697e-01  1.371e+06  6.34e-07 0.99
>
> ezM:mtotalrain -3.393e-02  5.799e-03-5.851 5.68e-09 ***
>
> ezP:mtotalrain -1.901e-02  5.379e-03-3.535 0.000417 ***
>
> ezU:mtotalrain  3.510e-02  4.074e+04  8.62e-07 0.99
>
> ---
>
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
>
>  
>
> Approximate significance of smooth terms:
>
> edf Ref.df F p-value
>
> s(east,north) 8.736  8.736 28.88  <2e-16 ***
>
> ---
>
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
>
>  
>
> R-sq.(adj) =  0.324   Deviance explained = -5.12e+03%
>
> GCV score = 39.556  Scale est. = 39.056n = 2038
>
>  
>
>  
>
> Count = bird counts/square
>
>   

Is this really an integer?


> ez=environmental zone
>
> cv = habitat types
>
> mtemp = mean annual temperature
>
> mtotalrain= mean total rain/year
>
>  
>
> Sample size is approximately 2000.
>
>  
>
> The offset fit.vec is bird detectability and the weighting is based on the
> number of squares in each area surveyed. I belief that the strange
deviance
> explained is due to the weighting we have added into the model.
>
>   
Why would you use a weighting factor in a Poisson/quasi-Poisson GLM/GAM? 
See also the weights 

Re: [R-sig-eco] low predicted vales in GAMs (Anna Renwick)

2009-12-12 Thread Highland Statistics Ltd.



--

Message: 1
Date: Fri, 11 Dec 2009 11:43:40 -
From: "Anna Renwick" 
Subject: [R-sig-eco] low predicted vales in GAMs
To: 
Message-ID: 
Content-Type: text/plain

Dear All

 


I have come across a problem with the GAM models I am running. Basically the
predicted values are consistently only about 0.4 of the actual values. 

 


A bit more detail:

MODEL:

m4<-gam(count~s(east,north,k=10)+ez+cv01+cv03+cv04+cv05+cv07+mtemp+mtotalrai
n+ez:mtemp+ez:mtotalrain+

offset(log(fit.vec)),

weights=wt,

data=spat6,

family=quasipoisson,

start=rep(0,26)

)

MODEL SUMMARY:

 

Family: quasipoisson 

Link function: log 

 


Formula:

count ~ s(east, north, k = 10) + ez + cv01 + cv03 + cv04 + cv05 + 


cv07 + mtemp + mtotalrain + ez:mtemp + ez:mtotalrain +
offset(log(fit.vec))

 


Parametric coefficients:

 Estimate Std. Error   t value Pr(>|t|)

(Intercept)-5.296e+00  1.846e+00-2.869 0.004166 ** 

ezM 1.651e+00  2.102e+00 0.785 0.432397


ezP 7.358e+00  2.047e+00 3.595 0.000332 ***

ezU-1.061e+02  1.064e+07 -9.97e-06 0.92


cv017.405e-02  5.437e-0313.620  < 2e-16 ***

cv032.258e-02  5.145e-03 4.389 1.20e-05 ***

cv042.878e-02  4.839e-03 5.949 3.18e-09 ***

cv053.634e-02  5.326e-03 6.823 1.17e-11 ***

cv072.370e-02  5.712e-03 4.149 3.48e-05 ***

mtemp  -1.838e-01  1.750e-01-1.050 0.293900


mtotalrain  1.872e-02  5.072e-03 3.692 0.000229 ***

ezM:mtemp   6.181e-02  2.204e-01 0.280 0.779197


ezP:mtemp  -7.028e-01  2.050e-01-3.429 0.000619 ***

ezU:mtemp   8.697e-01  1.371e+06  6.34e-07 0.99


ezM:mtotalrain -3.393e-02  5.799e-03-5.851 5.68e-09 ***

ezP:mtotalrain -1.901e-02  5.379e-03-3.535 0.000417 ***

ezU:mtotalrain  3.510e-02  4.074e+04  8.62e-07 0.99


---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 


Approximate significance of smooth terms:

edf Ref.df F p-value


s(east,north) 8.736  8.736 28.88  <2e-16 ***

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

 


R-sq.(adj) =  0.324   Deviance explained = -5.12e+03%

GCV score = 39.556  Scale est. = 39.056n = 2038

 

 


Count = bird counts/square

  


Is this really an integer?



ez=environmental zone

cv = habitat types

mtemp = mean annual temperature

mtotalrain= mean total rain/year

 


Sample size is approximately 2000.

 


The offset fit.vec is bird detectability and the weighting is based on the
number of squares in each area surveyed. I belief that the strange deviance
explained is due to the weighting we have added into the model.

  
Why would you use a weighting factor in a Poisson/quasi-Poisson GLM/GAM? 
See also the weights text for the help file for glm. Not sure what it 
would be doing.


 


I would have assumed that the predicted values divided by the real counts
should be around 1, however they are much lower and hence the model is
consistently predicting lower counts than were observed. I was wondering if
there is anything obvious which I am missing when carrying out these models.

  


you seem to have a very large overdispersion. But that is another 
problem. I think your number of squares should actually be used in the 
offset (the log obviously).


Alain

 


Many thanks,

Anna

 


Dr Anna R. Renwick
Research Ecologist
British Trust for Ornithology, 
The Nunnery, 
Thetford, 
Norfolk, 
IP24 2PU, 
UK
Tel: +44 (0)1842 750050; Fax: +44 (0)1842 750030 

 



[[alternative HTML version deleted]]



--

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-ecology


End of R-sig-ecology Digest, Vol 21, Issue 12
*

  



--


Dr. Alain F. Zuur
First author of:

1. Analysing Ecological Data (2007).
Zuur, AF, Ieno, EN and Smith, GM. Springer. 680 p.
URL: www.springer.com/0-387-45967-7


2. Mixed effects models and extensions in ecology with R. (2009).
Zuur, AF, Ieno, EN, Walker, N, Saveliev, AA, and Smith, GM. Springer.
http://www.springer.com/life+sci/ecology/book/978-0-387-87457-9


3. A Beginner's Guide to R (2009).
Zuur, AF, Ieno, EN, Meesters, EHWG. Springer
http://www.springer.com/statistics/computational/book/978-0-387-93836-3


Other books: http://www.highstat.com/books.htm


Statistical consultancy, courses, data analysis and software
Highland Statistics Ltd.
6 Laverock road
UK - AB41 6FN Newburgh
Tel: 0044 1358 788177
Email: highs...@highstat.com
URL: www.highstat.com
URL: www.brodgar.com

___
R-sig-ecology mailing list
R-sig-ecology@r-project.org
https://stat.ethz.ch/mailman/lis