Hello everybody

I want to compare the proportions of germinated seeds (seed batches of size 10) of three plant types (1,2,3) with a glm with binomial data (following the method in Crawley: Statistics,an introduction using R, p.247). The problem seems to be that in two plant types (2,3) all plants have proportions = 0.
I give you my data and the model I'm running:

  success failure type
 [1,]   0   10    3
 [2,]   0   10    2
 [3,]   0   10    2
 [4,]   0   10    2
 [5,]   0   10    2
 [6,]   0   10    2
 [7,]   0   10    2
 [8,]   4    6    1
 [9,]   4    6    1
[10,]   3    7    1
[11,]   5    5    1
[12,]   7    3    1
[13,]   4    6    1
[14,]   0   10    3
[15,]   0   10    3
[16,]   0   10    3
[17,]   0   10    3
[18,]   0   10    3
[19,]   0   10    3
[20,]   0   10    2
[21,]   0   10    2
[22,]   0   10    2
[23,]   9    1    1
[24,]   6    4    1
[25,]   4    6    1
[26,]   0   10    3
[27,]   0   10    3

 y<- cbind(success, failure)

 Call:
glm(formula = y ~ type, family = binomial)

Deviance Residuals:
       Min          1Q      Median          3Q
-1.3521849  -0.0000427  -0.0000427  -0.0000427
       Max
 2.6477556

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    0.04445    0.21087   0.211    0.833
typeFxC      -23.16283 6696.13233  -0.003    0.997
typeFxD      -23.16283 6696.13233  -0.003    0.997

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 134.395  on 26  degrees of freedom
Residual deviance:  12.622  on 24  degrees of freedom
AIC: 42.437

Number of Fisher Scoring iterations: 20


Huge standard errors are calculated and there is no difference between plant type 1 and 2 or between plant type 1 and 3. If I add 1 to all successes, so that all the 0 values disappear, the standard error becomes lower and I find highly significant differences between the plant types.

suc<- success + 1
fail<- 11 - suc
Y<- cbind(suc,fail)

Call:
glm(formula = Y ~ type, family = binomial)

Deviance Residuals:
       Min          1Q      Median          3Q
-1.279e+00  -4.712e-08  -4.712e-08   0.000e+00
       Max
 2.584e+00

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.2231     0.2023   1.103     0.27
typeFxC      -2.5257     0.4039  -6.253 4.02e-10 ***
typeFxD      -2.5257     0.4039  -6.253 4.02e-10 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 86.391  on 26  degrees of freedom
Residual deviance: 11.793  on 24  degrees of freedom
AIC: 76.77

Number of Fisher Scoring iterations: 4


So I think the 0 values of all plants of group 2 and 3 are the problem, do you agree? I don't know why this is a problem, or how I can explain to a reviewer why a data transformation (+ 1) is necessary with such a dataset.

I would greatly appreciate any comments.
Juerg
______________________________________

Jürg Schulze
Department of Environmental Sciences
Section of Conservation Biology
University of Basel
St. Johanns-Vorstadt 10
4056 Basel, Switzerland
Tel.: ++41/61/267 08 47

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to