Re: [R] logistic regression

Mikhail Spivakov Wed, 25 Jun 2008 03:11:13 -0700

Dear Bill,

Thanks for your reply!


I agree that the residual deviance suggests that the model odds = f(A,B,C,D)
is totally saturated given the chunk of data TYPE=="a". 
But since I've actually got three chunks of data (TYPE=={"a","b","c"}) that
together form the whole, my model really looks like this:

                                       f1(A,B,C,D) if TYPE=="a",
odds (SUCCESS|TYPE) =   f2(A,B,C,D) if TYPE=="b",
                                       f3(A,B,C,D) if TYPE=="c",                
                      
odds(SUCCESS|TYPE=="a")+odds(SUCCESS|TYPE=="b")+odds(SUCCESS|TYPE=="c") = 1.
 
I reasoned that a better way of dealing with this would be to model odds =
f(TYPE, A,B,C,D), but I'm not sure if I was right. What do you think? 

Many thanks!
Mikhail


Bill.Venables wrote:
> 
> It looks like A*B*C*D is a complete, totally saturated model, (the
> residual deviance is effectively zero, and the residual degrees of
> freedom is exactly zero - this is a clue).  So when you try to put even
> more parameters into the model and even higher way interactions,
> something has to give.   
> 
> I find 3-factor interactions are about as much as I can think about
> without getting a bit giddy.  Do you really need 4- and 5-factor
> interactions?  If so, your only option is to get more data.
> 
> 
> Bill Venables
> CSIRO Laboratories
> PO Box 120, Cleveland, 4163
> AUSTRALIA
> Office Phone (email preferred): +61 7 3826 7251
> Fax (if absolutely necessary):  +61 7 3826 7304
> Mobile:                         +61 4 8819 4402
> Home Phone:                     +61 7 3286 7700
> mailto:[EMAIL PROTECTED]
> http://www.cmis.csiro.au/bill.venables/ 
> 
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
> On Behalf Of Mikhail Spivakov
> Sent: Wednesday, 25 June 2008 9:31 AM
> To: r-help@r-project.org
> Subject: [R] logistic regression
> 
> 
> Hi everyone,
> 
> I'm sorry if this turns out to be more a statistical question than one
> specifically about R - but would greatly appreciate your advice anyway.
> 
> I've been using a logistic regression model to look at the relationship
> between a binary outcome (say, the odds of picking n white balls from a
> bag
> containing m balls in total) and a variety of other binary parameters:
> 
> _________________________________________________________________
> 
>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> summary(a.fit)
> 
> glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family =
> binomial(link = "logit"), data = a)
> 
> Deviance Residuals: 
>  [1]  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
> 
> Coefficients:
>       Estimate        Std.    Error   z value Pr(>|z|)
> (Intercept)   -0.69751        0.02697 -25.861 <2.00E-16       ***
> A     -0.02911        0.05451 -0.534  0.593335        
> B     0.39842 0.06871 5.798   6.70E-09        ***
> C     0.829   0.06745 12.29   <2.00E-16       ***
> D     0.05928 0.11133 0.532   0.594401        
> A:B   -0.44053        0.13807 -3.191  0.001419        **
> A:C   -0.49596        0.13664 -3.63   0.000284        ***
> B:C   -0.62194        0.14164 -4.391  1.13E-05        ***
> A:D   -0.4031 0.2279  -1.769  0.076938        .
> B:D   -0.60238        0.25978 -2.319  0.020407        *
> C:D   -0.58467        0.27195 -2.15   0.031558        *
> A:B:C 0.5006  0.27364 1.829   0.067335        .
> A:B:D 0.51868 0.4683  1.108   0.268049        
> A:C:D 0.32882 0.51226 0.642   0.520943        
> B:C:D 0.56301 0.49903 1.128   0.259231        
> A:B:C:D       -0.32115        0.87969 -0.365  0.715059        
> 
> ---
> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
> 
> (Dispersion parameter for binomial family taken to be 1)
> 
>     Null deviance: 2.2185e+02  on 15  degrees of freedom
> Residual deviance: 1.0385e-12  on  0  degrees of freedom
> AIC: 124.50
> 
> Number of Fisher Scoring iterations: 3
> 
> _________________________________________________________________
> 
> This seems to produce sensible results given the actual data.
> However, there are actually three types of balls in the experiment and I
> need to model the relationship between the odds of picking each of the
> type
> and the parameters A,B,C,D. So what I do now is split the initial data
> table
> and just run glm three times:
> 
>>all
> 
> [fictional data]
> 
> TYPE WHITE ALL A B C D 
> a     100     400     1       0       0       0
> b     200     600     1       0       0       0
> c     10      300     1       0       0       0
> ....
> a     30      100     1       1       1       1
> b     50      200     1       1       1       1
> c     20      120     1       1       1       1
> 
>> a<-all[all$type=="a",]
>> b<-all[all$type=="b",]
>> c<-all[all$type=="c",]
>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
>> c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D,
>> family=binomial(link="logit"))
> 
> But it seems to me that I should be able to incorporate TYPE into the
> model. 
> 
> Something like:
> 
>>summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W
> HITE,ALL-WHITE)~TYPE*A*B*C*D))
> 
> [please see the output below]
> 
> However, when I do this, it does not seem to give an expected result.
> Is this not the right way to do it? 
> Or this is actually less powerful than running the three models
> separately?  
> 
> Will greatly appreciate your advice!
> 
> Many thanks
> Mikhail
> 
> -----
> 
>       Estimate        Std.    Error   z value Pr(>|z|)
> (Intercept)   -8.90E-01       1.91E-02        -46.553 <2.00E-16
> ***
> TYPE1 1.93E-01        2.47E-02        7.822   5.18E-15        ***
> TYPE2 1.19E+00        2.42E-02        49.108  <2.00E-16       ***
> A     1.89E-01        3.34E-02        5.665   1.47E-08        ***
> B     1.60E-01        4.41E-02        3.627   0.000286        ***
> C     2.24E-02        4.91E-02        0.455   0.64906 
> D     1.96E-01        6.58E-02        2.982   0.002868        **
> TYPE1:A       -2.19E-01       4.59E-02        -4.759  1.95E-06        ***
> TYPE2:A       -9.08E-01       4.50E-02        -20.178 <2.00E-16       ***
> TYPE1:C       2.39E-01        5.93E-02        4.022   5.77E-05        ***
> TYPE2:B       -1.82E+00       6.46E-02        -28.178 <2.00E-16       ***
> A:B   -2.26E-01       8.52E-02        -2.649  0.008066        **
> TYPE1:C       8.07E-01        6.27E-02        12.87   <2.00E-16       ***
> TYPE2:C       -2.51E+00       7.83E-02        -32.039 <2.00E-16       ***
> A:C   -1.70E-01       9.51E-02        -1.783  0.074512        .
> B:C   -3.01E-01       1.12E-01        -2.698  0.006985        **
> TYPE1:D       -1.37E-01       9.20E-02        -1.489  0.136548        
> TYPE2:D       -1.13E+00       9.19E-02        -12.329 <2.00E-16       ***
> A:D   -2.11E-01       1.27E-01        -1.655  0.097953        .
> B:D   -2.15E-01       1.55E-01        -1.387  0.165472        
> C:D   -5.51E-01       2.76E-01        -1.997  0.045829        *
> TYPE1:A:B     -2.15E-01       1.17E-01        -1.84   0.065714
> .
> 
> 
> TYPE2:A:B     7.21E-01        1.28E-01        5.635   1.75E-08
> ***
> TYPE1:A:C     -3.26E-01       1.24E-01        -2.643  0.008221
> **
> TYPE2:A:C     9.70E-01        1.53E-01        6.36    2.02E-10
> ***
> TYPE1:B:C     -3.21E-01       1.38E-01        -2.321  0.020313
> *
> TYPE2:B:C     1.35E+00        1.89E-01        7.133   9.85E-13
> ***
> A:B:C 1.80E-01        2.11E-01        0.852   0.394425        
> TYPE1:A:D     -1.92E-01       1.83E-01        -1.05   0.293758        
> TYPE2:A:D     6.76E-01        1.80E-01        3.75    0.000177
> ***
> TYPE1:B:D     -3.87E-01       2.16E-01        -1.796  0.072443
> .
> TYPE2:B:D     1.09E+00        2.30E-01        4.709   2.49E-06
> ***
> A:B:D 1.92E-01        2.73E-01        0.702   0.482512        
> TYPE1:C:D     -3.33E-02       3.18E-01        -0.105  0.916465        
> TYPE2:C:D     1.20E-01        5.05E-01        0.238   0.811914        
> A:C:D -7.37E+00       1.74E+04        -4.23E-04       0.999663        
> B:C:D 3.14E-01        4.92E-01        0.638   0.523254        
> TYPE1:A:B:C   3.21E-01        2.64E-01        1.218   0.223336        
> TYPE2:A:B:C   -8.43E-01       3.59E-01        -2.351  0.018747
> *
> TYPE1:A:B:D   3.27E-01        3.84E-01        0.85    0.3952  
> TYPE2:A:B:D   -6.59E-01       4.08E-01        -1.617  0.105883        
> TYPE1:A:C:D   7.69E+00        1.74E+04        4.42E-04        0.999648
> 
> TYPE2:A:C:D   -1.60E+01       3.48E+04        -4.58E-04       0.999634
> 
> TYPE1:B:C:D   2.49E-01        5.70E-01        0.437   0.662288
> TYPE2:B:C:D   -7.08E-01       8.97E-01        -0.789  0.430007
> A:B:C:D       9.08E-03        2.47E+04        3.67E-07        1
> TYPE1:A:B:C:D -3.30E-01       2.47E+04        -1.34E-05       0.999989
> TYPE2:A:B:C:D 1.10E+00        4.94E+04        2.22E-05        0.999982
> -- 
> View this message in context:
> http://www.nabble.com/logistic-regression-tp18102137p18102137.html
> Sent from the R help mailing list archive at Nabble.com.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/logistic-regression-tp18102137p18108808.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] logistic regression

Reply via email to