It looks like A*B*C*D is a complete, totally saturated model, (the residual deviance is effectively zero, and the residual degrees of freedom is exactly zero - this is a clue). So when you try to put even more parameters into the model and even higher way interactions, something has to give.
I find 3-factor interactions are about as much as I can think about without getting a bit giddy. Do you really need 4- and 5-factor interactions? If so, your only option is to get more data. Bill Venables CSIRO Laboratories PO Box 120, Cleveland, 4163 AUSTRALIA Office Phone (email preferred): +61 7 3826 7251 Fax (if absolutely necessary): +61 7 3826 7304 Mobile: +61 4 8819 4402 Home Phone: +61 7 3286 7700 mailto:[EMAIL PROTECTED] http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mikhail Spivakov Sent: Wednesday, 25 June 2008 9:31 AM To: r-help@r-project.org Subject: [R] logistic regression Hi everyone, I'm sorry if this turns out to be more a statistical question than one specifically about R - but would greatly appreciate your advice anyway. I've been using a logistic regression model to look at the relationship between a binary outcome (say, the odds of picking n white balls from a bag containing m balls in total) and a variety of other binary parameters: _________________________________________________________________ > a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, > family=binomial(link="logit")) > summary(a.fit) glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family = binomial(link = "logit"), data = a) Deviance Residuals: [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -0.69751 0.02697 -25.861 <2.00E-16 *** A -0.02911 0.05451 -0.534 0.593335 B 0.39842 0.06871 5.798 6.70E-09 *** C 0.829 0.06745 12.29 <2.00E-16 *** D 0.05928 0.11133 0.532 0.594401 A:B -0.44053 0.13807 -3.191 0.001419 ** A:C -0.49596 0.13664 -3.63 0.000284 *** B:C -0.62194 0.14164 -4.391 1.13E-05 *** A:D -0.4031 0.2279 -1.769 0.076938 . B:D -0.60238 0.25978 -2.319 0.020407 * C:D -0.58467 0.27195 -2.15 0.031558 * A:B:C 0.5006 0.27364 1.829 0.067335 . A:B:D 0.51868 0.4683 1.108 0.268049 A:C:D 0.32882 0.51226 0.642 0.520943 B:C:D 0.56301 0.49903 1.128 0.259231 A:B:C:D -0.32115 0.87969 -0.365 0.715059 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 2.2185e+02 on 15 degrees of freedom Residual deviance: 1.0385e-12 on 0 degrees of freedom AIC: 124.50 Number of Fisher Scoring iterations: 3 _________________________________________________________________ This seems to produce sensible results given the actual data. However, there are actually three types of balls in the experiment and I need to model the relationship between the odds of picking each of the type and the parameters A,B,C,D. So what I do now is split the initial data table and just run glm three times: >all [fictional data] TYPE WHITE ALL A B C D a 100 400 1 0 0 0 b 200 600 1 0 0 0 c 10 300 1 0 0 0 .... a 30 100 1 1 1 1 b 50 200 1 1 1 1 c 20 120 1 1 1 1 > a<-all[all$type=="a",] > b<-all[all$type=="b",] > c<-all[all$type=="c",] > a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, > family=binomial(link="logit")) > b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, > family=binomial(link="logit")) > c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, > family=binomial(link="logit")) But it seems to me that I should be able to incorporate TYPE into the model. Something like: >summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W HITE,ALL-WHITE)~TYPE*A*B*C*D)) [please see the output below] However, when I do this, it does not seem to give an expected result. Is this not the right way to do it? Or this is actually less powerful than running the three models separately? Will greatly appreciate your advice! Many thanks Mikhail ----- Estimate Std. Error z value Pr(>|z|) (Intercept) -8.90E-01 1.91E-02 -46.553 <2.00E-16 *** TYPE1 1.93E-01 2.47E-02 7.822 5.18E-15 *** TYPE2 1.19E+00 2.42E-02 49.108 <2.00E-16 *** A 1.89E-01 3.34E-02 5.665 1.47E-08 *** B 1.60E-01 4.41E-02 3.627 0.000286 *** C 2.24E-02 4.91E-02 0.455 0.64906 D 1.96E-01 6.58E-02 2.982 0.002868 ** TYPE1:A -2.19E-01 4.59E-02 -4.759 1.95E-06 *** TYPE2:A -9.08E-01 4.50E-02 -20.178 <2.00E-16 *** TYPE1:C 2.39E-01 5.93E-02 4.022 5.77E-05 *** TYPE2:B -1.82E+00 6.46E-02 -28.178 <2.00E-16 *** A:B -2.26E-01 8.52E-02 -2.649 0.008066 ** TYPE1:C 8.07E-01 6.27E-02 12.87 <2.00E-16 *** TYPE2:C -2.51E+00 7.83E-02 -32.039 <2.00E-16 *** A:C -1.70E-01 9.51E-02 -1.783 0.074512 . B:C -3.01E-01 1.12E-01 -2.698 0.006985 ** TYPE1:D -1.37E-01 9.20E-02 -1.489 0.136548 TYPE2:D -1.13E+00 9.19E-02 -12.329 <2.00E-16 *** A:D -2.11E-01 1.27E-01 -1.655 0.097953 . B:D -2.15E-01 1.55E-01 -1.387 0.165472 C:D -5.51E-01 2.76E-01 -1.997 0.045829 * TYPE1:A:B -2.15E-01 1.17E-01 -1.84 0.065714 . TYPE2:A:B 7.21E-01 1.28E-01 5.635 1.75E-08 *** TYPE1:A:C -3.26E-01 1.24E-01 -2.643 0.008221 ** TYPE2:A:C 9.70E-01 1.53E-01 6.36 2.02E-10 *** TYPE1:B:C -3.21E-01 1.38E-01 -2.321 0.020313 * TYPE2:B:C 1.35E+00 1.89E-01 7.133 9.85E-13 *** A:B:C 1.80E-01 2.11E-01 0.852 0.394425 TYPE1:A:D -1.92E-01 1.83E-01 -1.05 0.293758 TYPE2:A:D 6.76E-01 1.80E-01 3.75 0.000177 *** TYPE1:B:D -3.87E-01 2.16E-01 -1.796 0.072443 . TYPE2:B:D 1.09E+00 2.30E-01 4.709 2.49E-06 *** A:B:D 1.92E-01 2.73E-01 0.702 0.482512 TYPE1:C:D -3.33E-02 3.18E-01 -0.105 0.916465 TYPE2:C:D 1.20E-01 5.05E-01 0.238 0.811914 A:C:D -7.37E+00 1.74E+04 -4.23E-04 0.999663 B:C:D 3.14E-01 4.92E-01 0.638 0.523254 TYPE1:A:B:C 3.21E-01 2.64E-01 1.218 0.223336 TYPE2:A:B:C -8.43E-01 3.59E-01 -2.351 0.018747 * TYPE1:A:B:D 3.27E-01 3.84E-01 0.85 0.3952 TYPE2:A:B:D -6.59E-01 4.08E-01 -1.617 0.105883 TYPE1:A:C:D 7.69E+00 1.74E+04 4.42E-04 0.999648 TYPE2:A:C:D -1.60E+01 3.48E+04 -4.58E-04 0.999634 TYPE1:B:C:D 2.49E-01 5.70E-01 0.437 0.662288 TYPE2:B:C:D -7.08E-01 8.97E-01 -0.789 0.430007 A:B:C:D 9.08E-03 2.47E+04 3.67E-07 1 TYPE1:A:B:C:D -3.30E-01 2.47E+04 -1.34E-05 0.999989 TYPE2:A:B:C:D 1.10E+00 4.94E+04 2.22E-05 0.999982 -- View this message in context: http://www.nabble.com/logistic-regression-tp18102137p18102137.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.