On Thu, 11 Feb 2010, Luciano La Sala wrote:

Dear R crew:

I am sorry this question has been posted before, but I can't seem to solve
this problem yet.

Just for the others reader who might not recall that you asked virtually the same question last week. This was my reply:
  https://stat.ethz.ch/pipermail/r-help/2010-February/227040.html

I have a simple dataset consisting of two variables: cestode intensity and
chick size (defined as CAPI).
Intensity is a count and clearly overdispersed, with way too many zeroes.
I'm interested in looking at the association between these two variables,
i.e. how well does chick size predict tape intensity?

Since I have a small sample size, I fit a zero inflated negat. Binomial (not
Poisson) model using the "pscl" package.

I built tried two models and got the outputs below.

model <- zeroinfl(Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)

Call:
zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)

Count model coefficients (negbin with log link):
(Intercept)         CAPI
  -2.99182      0.06817
Theta = 0.4528

Zero-inflation model coefficients (binomial with logit link):
(Intercept)         CAPI
   12.1364      -0.1572

summary(model)

Call:
zeroinfl(formula = Int_Cesto ~ CAPI, dist = "negbin", EM = TRUE)

Pearson residuals:
    Min       1Q   Median       3Q      Max
-0.62751 -0.38842 -0.21303 -0.06899  7.29566

Count model coefficients (negbin with log link):
           Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.99182    3.39555  -0.881   0.3783
CAPI         0.06817    0.04098   1.664   0.0962 .
Log(theta)  -0.79222    0.45031  -1.759   0.0785 .

Zero-inflation model coefficients (binomial with logit link):
           Estimate Std. Error z value Pr(>|z|)
(Intercept) 12.13636    3.71918   3.263  0.00110 **
CAPI        -0.15720    0.04989  -3.151  0.00163 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Theta = 0.4528
Number of iterations in BFGS optimization: 1
Log-likelihood: -140.2 on 5 Df

QUESTIONS

1. Is my model adequately specified?

See my reply linked above. I guess it's hard to say more than that.

2. CAPI is included in blocks 1 of output containing negative binomial
regression coefficients for CAPI, and is also included also in block 2
corresponding to the inflation model. Does this make sense?

Dito.

If I specify my model slightly differently, I get what I believe is more
reasonable results:

model12 <- zeroinfl(Int_Cesto ~ 1|CAPI, dist = "negbin", EM = TRUE)
model12

Call:
zeroinfl(formula = Int_Cesto ~ 1 | CAPI, dist = "negbin", EM = TRUE)

Count model coefficients (negbin with log link):
(Intercept)
     2.692
Theta = 0.4346

Zero-inflation model coefficients (binomial with logit link):
(Intercept)         CAPI
   13.2476      -0.1708

summary(model12)

Call:
zeroinfl(formula = Int_Cesto ~ 1 | CAPI, dist = "negbin", EM = TRUE)

Pearson residuals:
    Min       1Q   Median       3Q      Max
-0.61616 -0.36902 -0.19466 -0.06666  4.85481

Count model coefficients (negbin with log link):
           Estimate Std. Error z value Pr(>|z|)
(Intercept)   2.6924     0.3031   8.883   <2e-16 ***
Log(theta)   -0.8334     0.4082  -2.042   0.0412 *

Zero-inflation model coefficients (binomial with logit link):
           Estimate Std. Error z value Pr(>|z|)
(Intercept) 13.24757    3.64531   3.634 0.000279 ***
CAPI        -0.17078    0.04921  -3.471 0.000519 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Theta = 0.4346
Number of iterations in BFGS optimization: 1
Log-likelihood: -141.9 on 4 Df

QUESTION:

1.    Is this model specification and output more reasonable?
2.    CAPI appears only in the second block that corresponds to the
inflation model.

You can apply standard model selection techniques. Hands-on examples for that are in the vignette that I pointed you to last week:
  vignette("countreg", package = "pscl")

Different model selection strategies may however yield different models. AIC will prefer the model with CAPI in the count equation. In contrast, a Wald test at 5% level would drop it. You could also look at the BIC and at the LR test. My guess is though that the answer will be: CAPI has some weak but practically not very relevant influence on the mean in the count component.

But I strongly recommend that you try to get more familiar with the zero-inflated model and the general model selection strategies. Or you could try to get help from a local statistician to obtain an appropriate model and interpret its results.

hth,
Z


Thanks in advance!
Luciano









        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to