[R] problem with glm(family=binomial) when some levels have only 0 proportion values

Jürg Schulze Wed, 02 Mar 2011 02:16:43 -0800

Hello everybody

I want to compare the proportions of germinated seeds (seed batches ofsize 10) of three plant types (1,2,3) with a glm with binomial data(following the method in Crawley: Statistics,an introduction using R,p.247).The problem seems to be that in two plant types (2,3) all plants haveproportions = 0.

I give you my data and the model I'm running:


  success failure type
 [1,]   0   10    3
 [2,]   0   10    2
 [3,]   0   10    2
 [4,]   0   10    2
 [5,]   0   10    2
 [6,]   0   10    2
 [7,]   0   10    2
 [8,]   4    6    1
 [9,]   4    6    1
[10,]   3    7    1
[11,]   5    5    1
[12,]   7    3    1
[13,]   4    6    1
[14,]   0   10    3
[15,]   0   10    3
[16,]   0   10    3
[17,]   0   10    3
[18,]   0   10    3
[19,]   0   10    3
[20,]   0   10    2
[21,]   0   10    2
[22,]   0   10    2
[23,]   9    1    1
[24,]   6    4    1
[25,]   4    6    1
[26,]   0   10    3
[27,]   0   10    3

 y<- cbind(success, failure)

 Call:
glm(formula = y ~ type, family = binomial)

Deviance Residuals:
       Min          1Q      Median          3Q
-1.3521849  -0.0000427  -0.0000427  -0.0000427
       Max
 2.6477556

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    0.04445    0.21087   0.211    0.833
typeFxC      -23.16283 6696.13233  -0.003    0.997
typeFxD      -23.16283 6696.13233  -0.003    0.997

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 134.395  on 26  degrees of freedom
Residual deviance:  12.622  on 24  degrees of freedom
AIC: 42.437

Number of Fisher Scoring iterations: 20

Huge standard errors are calculated and there is no difference betweenplant type 1 and 2 or between plant type 1 and 3.If I add 1 to all successes, so that all the 0 values disappear, thestandard error becomes lower and I find highly significant differencesbetween the plant types.


suc<- success + 1
fail<- 11 - suc
Y<- cbind(suc,fail)

Call:
glm(formula = Y ~ type, family = binomial)

Deviance Residuals:
       Min          1Q      Median          3Q
-1.279e+00  -4.712e-08  -4.712e-08   0.000e+00
       Max
 2.584e+00

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.2231     0.2023   1.103     0.27
typeFxC      -2.5257     0.4039  -6.253 4.02e-10 ***
typeFxD      -2.5257     0.4039  -6.253 4.02e-10 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 86.391  on 26  degrees of freedom
Residual deviance: 11.793  on 24  degrees of freedom
AIC: 76.77

Number of Fisher Scoring iterations: 4

So I think the 0 values of all plants of group 2 and 3 are theproblem, do you agree?I don't know why this is a problem, or how I can explain to a reviewerwhy a data transformation (+ 1) is necessary with such a dataset.


I would greatly appreciate any comments.
Juerg
______________________________________

Jürg Schulze
Department of Environmental Sciences
Section of Conservation Biology
University of Basel
St. Johanns-Vorstadt 10
4056 Basel, Switzerland
Tel.: ++41/61/267 08 47

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] problem with glm(family=binomial) when some levels have only 0 proportion values

Reply via email to