Re: [R] problem with glm(family=binomial) when some levels have only 0 proportion values

Robert A LaBudde Wed, 02 Mar 2011 07:11:35 -0800

The algorithm is not converging. Your iterations are at the maximum.

It won't do any good to add a fractional numberto all data, as the result will depend on thenumber added (try 1.0, 0.5 and 0.1 to see this).

The root problem is that your data aredegenerate. Firstly, your types '2' and '3' areindistinguishable in your data. Secondly,consider the case without 'type'. If you have allzero data for 10 trials, you cannot discriminateamong mu = 0, 0.00001, 0.0001, 0.001 or 0.01.This leads to numerical instability. Thirdly, thevariance estimate in the IRLS will start at 0.0, which gives a singularity.

Fundamentally, the algorithm is failing becauseyou are at the boundary of possibilities for aparameter, so special techniques are needed to domaximum likelihood estimation.

The simple solution is to deal with the data foryour types separately. Another is to do morebatches for '2' and '3' to get an observed failure.




At 05:01 AM 3/2/2011, Jürg Schulze wrote:

Hello everybody

I want to compare the proportions of germinated seeds (seed batches of
size 10) of three plant types (1,2,3) with a glm with binomial data
(following the method in Crawley: Statistics,an introduction using R,
p.247).
The problem seems to be that in two plant types (2,3) all plants have
proportions = 0.
I give you my data and the model I'm running:

  success failure type
 [1,]   0   10    3
 [2,]   0   10    2
 [3,]   0   10    2
 [4,]   0   10    2
 [5,]   0   10    2
 [6,]   0   10    2
 [7,]   0   10    2
 [8,]   4    6    1
 [9,]   4    6    1
[10,]   3    7    1
[11,]   5    5    1
[12,]   7    3    1
[13,]   4    6    1
[14,]   0   10    3
[15,]   0   10    3
[16,]   0   10    3
[17,]   0   10    3
[18,]   0   10    3
[19,]   0   10    3
[20,]   0   10    2
[21,]   0   10    2
[22,]   0   10    2
[23,]   9    1    1
[24,]   6    4    1
[25,]   4    6    1
[26,]   0   10    3
[27,]   0   10    3

 y<- cbind(success, failure)

 Call:
glm(formula = y ~ type, family = binomial)

Deviance Residuals:
       Min          1Q      Median          3Q
-1.3521849  -0.0000427  -0.0000427  -0.0000427
       Max
 2.6477556

Coefficients:
              Estimate Std. Error z value Pr(>|z|)
(Intercept)    0.04445    0.21087   0.211    0.833
typeFxC      -23.16283 6696.13233  -0.003    0.997
typeFxD      -23.16283 6696.13233  -0.003    0.997

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 134.395  on 26  degrees of freedom
Residual deviance:  12.622  on 24  degrees of freedom
AIC: 42.437

Number of Fisher Scoring iterations: 20


Huge standard errors are calculated and there is no difference between
plant type 1 and 2 or between plant type 1 and 3.
If I add 1 to all successes, so that all the 0 values disappear, the
standard error becomes lower and I find highly significant differences
between the plant types.

suc<- success + 1
fail<- 11 - suc
Y<- cbind(suc,fail)

Call:
glm(formula = Y ~ type, family = binomial)

Deviance Residuals:
       Min          1Q      Median          3Q
-1.279e+00  -4.712e-08  -4.712e-08   0.000e+00
       Max
 2.584e+00

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.2231     0.2023   1.103     0.27
typeFxC      -2.5257     0.4039  -6.253 4.02e-10 ***
typeFxD      -2.5257     0.4039  -6.253 4.02e-10 ***
---
Signif. codes:  0 ?***? 0.001 ?**? 0.01 ?*? 0.05 ?.? 0.1 ? ? 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 86.391  on 26  degrees of freedom
Residual deviance: 11.793  on 24  degrees of freedom
AIC: 76.77

Number of Fisher Scoring iterations: 4


So I think the 0 values of all plants of group 2 and 3 are the
problem, do you agree?
I don't know why this is a problem, or how I can explain to a reviewer
why a data transformation (+ 1) is necessary with such a dataset.

I would greatly appreciate any comments.
Juerg
______________________________________

Jürg Schulze
Department of Environmental Sciences
Section of Conservation Biology
University of Basel
St. Johanns-Vorstadt 10
4056 Basel, Switzerland
Tel.: ++41/61/267 08 47

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


================================================================
Robert A. LaBudde, PhD, PAS, Dpl. ACAFS  e-mail: r...@lcfltd.com
Least Cost Formulations, Ltd.            URL: http://lcfltd.com/
824 Timberlake Drive                     Tel: 757-467-0954
Virginia Beach, VA 23464-3239            Fax: 757-467-2947

"Vere scire est per causas scire"
================================================================

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] problem with glm(family=binomial) when some levels have only 0 proportion values

Reply via email to