I have a large dataset of 118225 observations from 16 columns and as such I’ve 
been using bam, rather than gam, for my analyses.

The response variable I’m using is count data but it’s overdispersed, and as 
such, I thought I’d use a negative binomial model. I have 5 explanatory 
variables, which are biologically important. Two are numerical and 3 are 
categorical. I’ve only applied a smoother to the first numerical explanatory 
variable, because, from some prior analyses I found that TL had edf values of 
1.01 and was therefore linear. I also have included categorical two random 
effects in the model.

m3 <- bam(deg ~ s(SE_score) + TL + species + sex + season + year +
            s(code, bs = 're') + s(monthyear, bs = 're'),
          family=nb(), data=node_dat, method = "REML")

th <- m3$family$getTheta(TRUE) #extracts theta

m3 <- bam(deg ~ s(SE_score) + TL + species + sex + season + year +
            s(code, bs = 're') + s(monthyear, bs = 're'),
          family=nb(th), data=node_dat, method = "REML")

summary(m3)

However I’m getting this warning and I can’t find out what it means

There were 32 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In pmax(1, y)/mu :
  longer object length is not a multiple of shorter object length
2: In y * log(pmax(1, y)/mu) :
  longer object length is not a multiple of shorter object length

Is this an issue? The model converges, and I’ve checked overdispersion again 
and get this value

> E3 <- resid(m3, type = "pearson")
> sum(E3^2)/m3$df.res
[1] 0.7436045

So this suggests there is some under dispersion now? Also the model summary 
gives

> summary(m3)

Family: Negative Binomial(0.055)
Link function: log

I’ve read that the 0.055 is also a measure of dispersion so which one is 
correct?

I was confused about all this and I have a lot of zeros in my data (about 96%) 
so I thought I’d also try an zero inflated poisson, however is does not 
converge.

m4 <- bam(deg ~ s(SE_score) + TL + species + sex + season + year +
                  s(code, bs = 're') + s(monthyear, bs = 're'),
                family=ziP(), data=node_dat, method = "REML")

Warning message:
In bgam.fit(G, mf, chunk.size, gp, scale, gamma, method = method,  :
  algorithm did not converge

Is there any reason why it does not onverge? And maybe a zero inflated negative 
binomial would better but I’m not sure how to undertake that.

I know there’s a lot here but any help would be appreciated.

Many thanks,

Mike



Michael Williamson
London NERC DTP Candidate

Email: michael.william...@kcl.ac.uk<mailto:michael.william...@kcl.ac.uk> Phone: 
+447764836592 Skype: mikejwilliamson Twitter: @mjw_marine Website: 
www.thenetlab.uk<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.thenetlab.uk%2F&data=01%7C01%7Cmichael.williamson%40kcl.ac.uk%7C07c592826b364b249c9208d84e5dbc12%7C8370cf1416f34c16b83c724071654356%7C0&sdata=vaibGznfTGGiS7l0lHuRaQ3w4fnEQGaXIfgQ34OrhG4%3D&reserved=0>

Most recent paper:
Williamson, M. J. et al. (2021). Analysing detection gaps in acoustic telemetry 
data to infer differential movement patterns in fish. Ecology and Evolution, 
11, 2717-2730. https://doi.org/10.1002/ece3.7226




        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to