Dear All, I had a quick look at the internal functions used by pscl::hurdle to do the numerical optimization by optim. It clearly corresponds to the hurdle model defined in the paper/vignette, where the zero component is based on a right censored random variable, that is 0 if the original count data is 0 and 1 otherwise. The likelihood function for the zero model corresponds to a censored Poisson model. The count estimation part is based on left truncated Poisson. This is a conditional inference thinking, the zero model tells us what determines if the data is 0 or >0, and once the observations are >0, than what determines the exact count. If estimates from the 2 models are identical, it means that 0s can arise from the same Poisson distribution as the counts. So this is not really a mixture as it is the case with the zeroinfl() model. The resulting log-likelihood is still valid, and table 1 clearly states that the hurdle model is based on ML (maximum likelihood).
It is not the same estimating procedure as for the quasipoisson, where a likelihood-like function is used to get estimates (note that parameter estimates are the same as for Poisson, but SEs and the dispaersion parameter are different). To handle other overdispersion than zero inflation, one can choose NB instead of Poisson in hurdle. The quasipoisson family is not allowed there. Cheers, Peter Péter Sólymos Alberta Biodiversity Monitoring Institute and Boreal Avian Modelling project Department of Biological Sciences CW 405, Biological Sciences Bldg University of Alberta Edmonton, Alberta, T6G 2E9, Canada Phone: 780.492.8534 Fax: 780.492.7635 email <- paste("solymos", "ualberta.ca", sep = "@") http://www.abmi.ca http://sites.google.com/site/psolymos On Thu, Aug 19, 2010 at 7:22 AM, Jari Oksanen <jari.oksa...@oulu.fi> wrote: > On Thu, 2010-08-19 at 14:54 +0300, Gavin Simpson wrote: >> On Thu, 2010-08-19 at 13:20 +0200, Yingjie Zhang wrote: > >> They fit several models and compare them: >> >> I. Poisson >> II. Negative Binomial >> III. Quasi-likelihood >> IV. Hurdle model >> V. zero-inflated model >> >> III should be a quasi-poisson model, i.e. you fit the Poisson GLM >> using >> quasi-likelihood and model the dispersion parameter \phi alongside the >> usual Poisson GLM parameters. >> >> Section 2.3 of their paper on the hurdle model doesn't even mention >> "quasi". Though they do mention this in Table2. >> >> Reading this, I think they cooked this model themselves - you can fit >> a >> binomial model yourself for the presence absence and then fit a count >> model for the samples predicted to be present from the binomial part. >> To >> make things simple I suspect they fitted the count part as >> quasi-Poisson >> but no-where does it say exactly what they did. > > I know that at least Jane Elith has an email address (I have used it > years ago), so you could ask her. However, it may be that their hurdle > model uses just Poisson, and there is a minor mistake in their Table 2. > > You can use quasipoisson() or poisson() in glm() in a very natural way: > the fitting happens via iteratively reweighted least squares, and all > you need to define is the relationship between fitted values and > variance. If you look at poisson() and quasipoisson() functions in R > (these provide the backbone of the glm(..., family=)), you see that the > differences are that quasipoissoin()$aic() always returns NA, and > quasipoisson() lacks item simulate(). Otherwise they work in a similar > way. Except in poisson() you take the scale (\phi) to be 1, and in > quasipoisson() you estimate the scale from the fitted model. Then you > just multiply standard errors with the scale, use F tests instead of > Chisq in anova() etc. > > I am not sure (or actually, I don't think) that this fitting parallelism > extends to *truncated* Poisson that is used in pscl::hurdle(). Although > you can do fitting by stages, and fit quasipoisson() glm for above-zero > values, I don't think this is the correct thing to do when you are not > allowed to have new zeros. However, the truncated poisson likelihood > model is a huge improvement over hand-fitting glm with iteratively > reweighted least squares and assuming constant variance/fit > relationship. > > If you are worried about the overdispersion of the above-zero count > data, use the truncated negative binomial model offerred by > pscl::hurdle(). It is designed for the purpose (and has a more exciting > narrative for ecologists). > > Cheers, Jari Oksanen > > _______________________________________________ > R-sig-ecology mailing list > R-sig-ecology@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-ecology > > _______________________________________________ R-sig-ecology mailing list R-sig-ecology@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-ecology