Re: [R] Zero inflated: is there a limit to the level of inflation

Stephanie L. Simek Tue, 26 Jun 2012 21:07:47 -0700

Thank you both for your quick response and input. I will consider all of
your points and see what we are able to derive from there.


Thank you again for your time and expertise.

-Stephanie

-------------------------------------------------------
Stephanie L. Simek
Carnivore Ecology Lab
Forest and Wildlife Research Center
Mississippi State University
Box 9690
Mississippi State, MS 39762
Cell: (850) 591-1430
Email: ssi...@cfr.msstate.edu


-----Original Message-----
From: Achim Zeileis [mailto:achim.zeil...@uibk.ac.at] 
Sent: Tuesday, June 26, 2012 4:46 PM
To: Marc Schwartz
Cc: Stephanie L. Simek; r-help@r-project.org
Subject: Re: [R] Zero inflated: is there a limit to the level of
inflation

On Tue, 26 Jun 2012, Marc Schwartz wrote:

> On Jun 26, 2012, at 2:10 PM, SSimek wrote:
>
>> Hello,
>>
>> I have count data that illustrate the presence or absence of 
>> individuals in my study population. I created a grid cell across the 
>> study area and calcuated a count value for each individual per season

>> per year for each grid cell. The count value is the number of time an

>> individual was present in each grid cell.  For illustration my data 
>> columns look something like this and are repeated for each
individual:
>>
>> Cell_ID      Param1  Param2  Param3  Param4  COUNT   Name    Year
Season  Cov
>> 1    160.565994      729.08  1503    7930.3  0       AA      2010
AUT     Open
>> 1    160.565994      729.08  1503    7930.3  22      AA      2011
SPR     Open
>> 1    160.565994      729.08  1503    7930.3  12      AA      2009
SUM     Open
>> 1    160.565994      729.08  1503    7930.3  0       AA      2010
SUM     Open
>> 2    169.427001      491.87  1503.31 5101.09 0       AA      2010
AUT     oldHard
>> 2    169.427001      491.87  1503.31 5101.09 16      AA      2011
SPR     oldHard
>> 2    169.427001      491.87  1503.31 5101.09 0       AA      2009
SUM     oldHard
>> 2    169.427001      491.87  1503.31 5101.09 0       AA      2010
SUM     oldHard
>> ?
>> 563  86.777099       612.69  977     4474.6  62      AA      2010
AUT     Water
>> 563  86.777099       612.69  977     4474.6  12      AA      2011
SPR     Water
>> 563  86.777099       612.69  977     4474.6  55      AA      2009
SUM     Water
>>
>>
>> 1    160.565994      729.08  1503    7930.3  0       BB      2010
SUM     Open
>> 2    169.427001      491.87  1503.31 5101.09 72      BB      2010
SUM     oldHard
>> 5    160.75  614.95  1503.31 2878.98 16      BB      2010    SUM
medHard
>> 6    170.404998      510.58  1489.44 743.14  0       BB      2010
SUM     Water
>> ?
>> 563  86.777099       612.69  977     4474.6  0       BB      2010
SUM     Water
>>
>>
>> 1    160.565994      729.08  1503    7930.3  14      C       2005
AUT     Open
>> 1    160.565994      729.08  1503    7930.3  0       C       2006
AUT     Open
>> 1    160.565994      729.08  1503    7930.3  0       C       2006
SPR     Open
>> 1    160.565994      729.08  1503    7930.3  56      C       2007
SPR     Open
>> 1    160.565994      729.08  1503    7930.3  0       C       2006
SUM     Open
>> 2    169.427001      491.87  1503.31 5101.09 124     C       2005
AUT     oldHard
>> 2    169.427001      491.87  1503.31 5101.09 231     C       2006
AUT     oldHard
>> 2    169.427001      491.87  1503.31 5101.09 889     C       2006
SPR     oldHard
>> 2    169.427001      491.87  1503.31 5101.09 0       C       2007
SPR     oldHard
>> ?
>> 563  86.777099               612.69  977     4474.6  0       C
2005    AUT     Water
>> 563  86.777099               612.69  977     4474.6  231     C
2006    AUT     Water
>> 563  86.777099               612.69  977     4474.6  185     C
2006    SPR     Water
>> 563  86.777099               612.69  977     4474.6  123     C
2007    SPR     Water
>> 563  86.777099               612.69  977     4474.6  52      C
2006    SUM     Water
>>
>>
>>
>> I have 563 grid cells across my study area and each individual has 
>> 1-563 cells associated for each year and each season the individual
was monitored.
>> Therefore my grid cells are repeated. I end up with 71,000 records 
>> and 925 records have a Count value >0; which means 70,075 records 
>> have a Count value = 0.
>>
>> I wanted to run a zero inflated poisson model to determine mixed 
>> effects (of
>> parameters) with individual as the random effect. But I have been 
>> advised two things:
>>
>> 1. I cannot run a zero inflated poisson model because my data are too

>> "extremely" inflated (i.e. 70,075 vs 925) and
>>
>> 2. I cannot run the model with each cell repeated for each 
>> individual. I am told the model doesn't recognize that Cell_ID #1 for

>> individual "A" is the same Cell_ID #1 for individual "B".
>>
>> Does anyone know if either or both of these points are true? I would 
>> appreciate any thoughts, advice, or suggestions.
>>
>> Thanks!
>>
>> -Stephanie
>
>
> Hi Stephanie,
>
> Some comments:
>
> 1. You should think about or at least be open to a zero inflated
negative binomial distribution rather than zero inflated poisson.
>
> 2. You should at least review the vignette for the pscl CRAN package,
which provides standard fixed effects models and related functions for
count based data and importantly, some good conceptual content:
>
>  http://cran.r-project.org/web/packages/pscl/vignettes/countreg.pdf
>
> 3. Given the repeated measures framework and correlation issues you
likely have, you should subscribe to and re-post your query to the
R-sig-mixed-models list:
>
>  https://stat.ethz.ch/mailman/listinfo/r-sig-mixed-models
>
> which will avail you of experts in the field.
>
> 4. There is also a draft FAQ for mixed models here:
>
>  http://glmm.wikidot.com/faq
>
> which I believe is maintained by Ben Bolker, who actively participates
in the above list. Based upon the content there, I suspect that you will
be pointed to the glmmADMB package which is on R-Forge
(http://glmmadmb.r-forge.r-project.org/) and can handle zero inflated
mixed effects models of at least some types.
>
> 5. If all else fails, just to plant a seed, you might want to consider

> a mixed effects logistic regression model with a binary response, 
> since you appear to have a relatively small "event" incidence in your
data.
> The above list will also be helpful in that setting and you would 
> likely be pointed to the glmer() function in the lme4 package for that

> application, which provides for GLMs in a mixed effects framework.

Thanks, Marc, all very useful points! Just one addition:

I would recommend starting with the last point - a binary response
regression (for y > 0). This could be considered as the zero-hurdle of a
hurdle regression.

Hurdle regressions are an alternative to zero-inflated models, but have
the nice property that you can separately estimate both parts of the
hurdle: (1) a binary regression for y=0 vs. y > 0. (2) A truncated count
model for y, estimated only from the observations y>0. The "pscl"
package contains a hurdle() function which estimates both parts in one
go (and the "countreg" vignette gives more details and references), but
in this case it would probably be useful to estimate them separately.

In any case, both parts will need care because the binary response
probably contains a lot of (quasi-)complete separations because
non-zeros are so rare. Conversely, the truncated count model may be hard
to estimate because there are no observations for a lot of parameter
combinations. But estimating the models separately will give you more
flexibility in addressing these issues.

To estimate the zero-truncated count distributions, you may consider the
"countreg" package from R-Forge which uses the same code as (one part
of) the hurdle() function.

hth,
Z

> Regards,
>
> Marc Schwartz
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Zero inflated: is there a limit to the level of inflation

Reply via email to