Re: [R] logistic regression

Bill.Venables Tue, 24 Jun 2008 17:19:14 -0700

It looks like A*B*C*D is a complete, totally saturated model, (the
residual deviance is effectively zero, and the residual degrees of
freedom is exactly zero - this is a clue).  So when you try to put even
more parameters into the model and even higher way interactions,
something has to give.


I find 3-factor interactions are about as much as I can think about
without getting a bit giddy.  Do you really need 4- and 5-factor
interactions?  If so, your only option is to get more data.


Bill Venables
CSIRO Laboratories
PO Box 120, Cleveland, 4163
AUSTRALIA
Office Phone (email preferred): +61 7 3826 7251
Fax (if absolutely necessary):  +61 7 3826 7304
Mobile:                         +61 4 8819 4402
Home Phone:                     +61 7 3286 7700
mailto:[EMAIL PROTECTED]
http://www.cmis.csiro.au/bill.venables/ 

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of Mikhail Spivakov
Sent: Wednesday, 25 June 2008 9:31 AM
To: r-help@r-project.org
Subject: [R] logistic regression


Hi everyone,

I'm sorry if this turns out to be more a statistical question than one
specifically about R - but would greatly appreciate your advice anyway.

I've been using a logistic regression model to look at the relationship
between a binary outcome (say, the odds of picking n white balls from a
bag
containing m balls in total) and a variety of other binary parameters:

_________________________________________________________________

> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
> family=binomial(link="logit"))
> summary(a.fit)

glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family =
binomial(link = "logit"), data = a)

Deviance Residuals: 
 [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0

Coefficients:
        Estimate        Std.    Error   z value Pr(>|z|)
(Intercept)     -0.69751        0.02697 -25.861 <2.00E-16       ***
A       -0.02911        0.05451 -0.534  0.593335        
B       0.39842 0.06871 5.798   6.70E-09        ***
C       0.829   0.06745 12.29   <2.00E-16       ***
D       0.05928 0.11133 0.532   0.594401        
A:B     -0.44053        0.13807 -3.191  0.001419        **
A:C     -0.49596        0.13664 -3.63   0.000284        ***
B:C     -0.62194        0.14164 -4.391  1.13E-05        ***
A:D     -0.4031 0.2279  -1.769  0.076938        .
B:D     -0.60238        0.25978 -2.319  0.020407        *
C:D     -0.58467        0.27195 -2.15   0.031558        *
A:B:C   0.5006  0.27364 1.829   0.067335        .
A:B:D   0.51868 0.4683  1.108   0.268049        
A:C:D   0.32882 0.51226 0.642   0.520943        
B:C:D   0.56301 0.49903 1.128   0.259231        
A:B:C:D -0.32115        0.87969 -0.365  0.715059        

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2.2185e+02  on 15  degrees of freedom
Residual deviance: 1.0385e-12  on  0  degrees of freedom
AIC: 124.50

Number of Fisher Scoring iterations: 3

_________________________________________________________________

This seems to produce sensible results given the actual data.
However, there are actually three types of balls in the experiment and I
need to model the relationship between the odds of picking each of the
type
and the parameters A,B,C,D. So what I do now is split the initial data
table
and just run glm three times:

>all

[fictional data]

TYPE WHITE ALL A B C D 
a       100     400     1       0       0       0
b       200     600     1       0       0       0
c       10      300     1       0       0       0
....
a       30      100     1       1       1       1
b       50      200     1       1       1       1
c       20      120     1       1       1       1

> a<-all[all$type=="a",]
> b<-all[all$type=="b",]
> c<-all[all$type=="c",]
> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
> family=binomial(link="logit"))
> b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
> family=binomial(link="logit"))
> c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
> family=binomial(link="logit"))

But it seems to me that I should be able to incorporate TYPE into the
model. 

Something like:

>summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W
HITE,ALL-WHITE)~TYPE*A*B*C*D))

[please see the output below]

However, when I do this, it does not seem to give an expected result.
Is this not the right way to do it? 
Or this is actually less powerful than running the three models
separately?  

Will greatly appreciate your advice!

Many thanks
Mikhail

-----

        Estimate        Std.    Error   z value Pr(>|z|)
(Intercept)     -8.90E-01       1.91E-02        -46.553 <2.00E-16
***
TYPE1   1.93E-01        2.47E-02        7.822   5.18E-15        ***
TYPE2   1.19E+00        2.42E-02        49.108  <2.00E-16       ***
A       1.89E-01        3.34E-02        5.665   1.47E-08        ***
B       1.60E-01        4.41E-02        3.627   0.000286        ***
C       2.24E-02        4.91E-02        0.455   0.64906 
D       1.96E-01        6.58E-02        2.982   0.002868        **
TYPE1:A -2.19E-01       4.59E-02        -4.759  1.95E-06        ***
TYPE2:A -9.08E-01       4.50E-02        -20.178 <2.00E-16       ***
TYPE1:C 2.39E-01        5.93E-02        4.022   5.77E-05        ***
TYPE2:B -1.82E+00       6.46E-02        -28.178 <2.00E-16       ***
A:B     -2.26E-01       8.52E-02        -2.649  0.008066        **
TYPE1:C 8.07E-01        6.27E-02        12.87   <2.00E-16       ***
TYPE2:C -2.51E+00       7.83E-02        -32.039 <2.00E-16       ***
A:C     -1.70E-01       9.51E-02        -1.783  0.074512        .
B:C     -3.01E-01       1.12E-01        -2.698  0.006985        **
TYPE1:D -1.37E-01       9.20E-02        -1.489  0.136548        
TYPE2:D -1.13E+00       9.19E-02        -12.329 <2.00E-16       ***
A:D     -2.11E-01       1.27E-01        -1.655  0.097953        .
B:D     -2.15E-01       1.55E-01        -1.387  0.165472        
C:D     -5.51E-01       2.76E-01        -1.997  0.045829        *
TYPE1:A:B       -2.15E-01       1.17E-01        -1.84   0.065714
.


TYPE2:A:B       7.21E-01        1.28E-01        5.635   1.75E-08
***
TYPE1:A:C       -3.26E-01       1.24E-01        -2.643  0.008221
**
TYPE2:A:C       9.70E-01        1.53E-01        6.36    2.02E-10
***
TYPE1:B:C       -3.21E-01       1.38E-01        -2.321  0.020313
*
TYPE2:B:C       1.35E+00        1.89E-01        7.133   9.85E-13
***
A:B:C   1.80E-01        2.11E-01        0.852   0.394425        
TYPE1:A:D       -1.92E-01       1.83E-01        -1.05   0.293758        
TYPE2:A:D       6.76E-01        1.80E-01        3.75    0.000177
***
TYPE1:B:D       -3.87E-01       2.16E-01        -1.796  0.072443
.
TYPE2:B:D       1.09E+00        2.30E-01        4.709   2.49E-06
***
A:B:D   1.92E-01        2.73E-01        0.702   0.482512        
TYPE1:C:D       -3.33E-02       3.18E-01        -0.105  0.916465        
TYPE2:C:D       1.20E-01        5.05E-01        0.238   0.811914        
A:C:D   -7.37E+00       1.74E+04        -4.23E-04       0.999663        
B:C:D   3.14E-01        4.92E-01        0.638   0.523254        
TYPE1:A:B:C     3.21E-01        2.64E-01        1.218   0.223336        
TYPE2:A:B:C     -8.43E-01       3.59E-01        -2.351  0.018747
*
TYPE1:A:B:D     3.27E-01        3.84E-01        0.85    0.3952  
TYPE2:A:B:D     -6.59E-01       4.08E-01        -1.617  0.105883        
TYPE1:A:C:D     7.69E+00        1.74E+04        4.42E-04        0.999648

TYPE2:A:C:D     -1.60E+01       3.48E+04        -4.58E-04       0.999634

TYPE1:B:C:D     2.49E-01        5.70E-01        0.437   0.662288
TYPE2:B:C:D     -7.08E-01       8.97E-01        -0.789  0.430007
A:B:C:D 9.08E-03        2.47E+04        3.67E-07        1
TYPE1:A:B:C:D   -3.30E-01       2.47E+04        -1.34E-05       0.999989
TYPE2:A:B:C:D   1.10E+00        4.94E+04        2.22E-05        0.999982
-- 
View this message in context:
http://www.nabble.com/logistic-regression-tp18102137p18102137.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression

Reply via email to