oh, I'm sorry... here's my affiliation: Mikhail Spivakov, PhD [EMAIL PROTECTED] Interdisciplinary Postdoctoral Fellow http://www.ebi.ac.uk/~spivakov/ European Bioinformatics Institute Tel: +44 1223 492660 (office) Wellcome Trust Genome Campus +44 7985 09 6675 (mob) Cambridge CB10 1SD, UK
On Thu, Jun 26, 2008 at 7:50 AM, Prof Brian Ripley <[EMAIL PROTECTED]> wrote: > On Wed, 25 Jun 2008, Mikhail Spivakov wrote: > > Sorry for flooding this forum, but I think I've realised that I need to do >> multinomial logistic regression for my problem... >> Would be interested in your opinion as to whether this is actually any >> better than running three binomial logistic regressions separately.. >> > > Sometimes, depending on the problem. We haven't seen a real-world problem > in this thread, and (see the posting guide) this is not a statistical > consultancy forum, so please ask your statsitical advisor for help. > > You won't get consultancy help here unless you give your affiliation in a > proper signature block (as the posting guide says). > > > >> Thanks >> M >> >> On Wed, Jun 25, 2008 at 1:17 AM, <[EMAIL PROTECTED]> wrote: >> >> It looks like A*B*C*D is a complete, totally saturated model, (the >>> residual deviance is effectively zero, and the residual degrees of >>> freedom is exactly zero - this is a clue). So when you try to put even >>> more parameters into the model and even higher way interactions, >>> something has to give. >>> >>> I find 3-factor interactions are about as much as I can think about >>> without getting a bit giddy. Do you really need 4- and 5-factor >>> interactions? If so, your only option is to get more data. >>> >>> >>> Bill Venables >>> CSIRO Laboratories >>> PO Box 120, Cleveland, 4163 >>> AUSTRALIA >>> Office Phone (email preferred): +61 7 3826 7251 >>> Fax (if absolutely necessary): +61 7 3826 7304 >>> Mobile: +61 4 8819 4402 >>> Home Phone: +61 7 3286 7700 >>> mailto:[EMAIL PROTECTED] >>> http://www.cmis.csiro.au/bill.venables/ >>> >>> -----Original Message----- >>> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >>> On Behalf Of Mikhail Spivakov >>> Sent: Wednesday, 25 June 2008 9:31 AM >>> To: r-help@r-project.org >>> Subject: [R] logistic regression >>> >>> >>> Hi everyone, >>> >>> I'm sorry if this turns out to be more a statistical question than one >>> specifically about R - but would greatly appreciate your advice anyway. >>> >>> I've been using a logistic regression model to look at the relationship >>> between a binary outcome (say, the odds of picking n white balls from a >>> bag >>> containing m balls in total) and a variety of other binary parameters: >>> >>> _________________________________________________________________ >>> >>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, >>>> family=binomial(link="logit")) >>>> summary(a.fit) >>>> >>> >>> glm(formula = cbind(SUCCESS, ALL - SUCCESS) ~ A * B * C * D family = >>> binomial(link = "logit"), data = a) >>> >>> Deviance Residuals: >>> [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >>> >>> Coefficients: >>> Estimate Std. Error z value Pr(>|z|) >>> (Intercept) -0.69751 0.02697 -25.861 <2.00E-16 *** >>> A -0.02911 0.05451 -0.534 0.593335 >>> B 0.39842 0.06871 5.798 6.70E-09 *** >>> C 0.829 0.06745 12.29 <2.00E-16 *** >>> D 0.05928 0.11133 0.532 0.594401 >>> A:B -0.44053 0.13807 -3.191 0.001419 ** >>> A:C -0.49596 0.13664 -3.63 0.000284 *** >>> B:C -0.62194 0.14164 -4.391 1.13E-05 *** >>> A:D -0.4031 0.2279 -1.769 0.076938 . >>> B:D -0.60238 0.25978 -2.319 0.020407 * >>> C:D -0.58467 0.27195 -2.15 0.031558 * >>> A:B:C 0.5006 0.27364 1.829 0.067335 . >>> A:B:D 0.51868 0.4683 1.108 0.268049 >>> A:C:D 0.32882 0.51226 0.642 0.520943 >>> B:C:D 0.56301 0.49903 1.128 0.259231 >>> A:B:C:D -0.32115 0.87969 -0.365 0.715059 >>> >>> --- >>> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 >>> >>> (Dispersion parameter for binomial family taken to be 1) >>> >>> Null deviance: 2.2185e+02 on 15 degrees of freedom >>> Residual deviance: 1.0385e-12 on 0 degrees of freedom >>> AIC: 124.50 >>> >>> Number of Fisher Scoring iterations: 3 >>> >>> _________________________________________________________________ >>> >>> This seems to produce sensible results given the actual data. >>> However, there are actually three types of balls in the experiment and I >>> need to model the relationship between the odds of picking each of the >>> type >>> and the parameters A,B,C,D. So what I do now is split the initial data >>> table >>> and just run glm three times: >>> >>> all >>>> >>> >>> [fictional data] >>> >>> TYPE WHITE ALL A B C D >>> a 100 400 1 0 0 0 >>> b 200 600 1 0 0 0 >>> c 10 300 1 0 0 0 >>> .... >>> a 30 100 1 1 1 1 >>> b 50 200 1 1 1 1 >>> c 20 120 1 1 1 1 >>> >>> a<-all[all$type=="a",] >>>> b<-all[all$type=="b",] >>>> c<-all[all$type=="c",] >>>> a.fit <- glm (data=a, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, >>>> family=binomial(link="logit")) >>>> b.fit <- glm (data=b, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, >>>> family=binomial(link="logit")) >>>> c.fit <- glm (data=c, formula=cbind(WHITE,ALL-WHITE)~A*B*C*D, >>>> family=binomial(link="logit")) >>>> >>> >>> But it seems to me that I should be able to incorporate TYPE into the >>> model. >>> >>> Something like: >>> >>> summary(glm(data=example2,family=binomial(link="logit"),formula=cbind(W >>>> >>> HITE,ALL-WHITE)~TYPE*A*B*C*D)) >>> >>> [please see the output below] >>> >>> However, when I do this, it does not seem to give an expected result. >>> Is this not the right way to do it? >>> Or this is actually less powerful than running the three models >>> separately? >>> >>> Will greatly appreciate your advice! >>> >>> Many thanks >>> Mikhail >>> >>> ----- >>> >>> Estimate Std. Error z value Pr(>|z|) >>> (Intercept) -8.90E-01 1.91E-02 -46.553 <2.00E-16 >>> *** >>> TYPE1 1.93E-01 2.47E-02 7.822 5.18E-15 *** >>> TYPE2 1.19E+00 2.42E-02 49.108 <2.00E-16 *** >>> A 1.89E-01 3.34E-02 5.665 1.47E-08 *** >>> B 1.60E-01 4.41E-02 3.627 0.000286 *** >>> C 2.24E-02 4.91E-02 0.455 0.64906 >>> D 1.96E-01 6.58E-02 2.982 0.002868 ** >>> TYPE1:A -2.19E-01 4.59E-02 -4.759 1.95E-06 *** >>> TYPE2:A -9.08E-01 4.50E-02 -20.178 <2.00E-16 *** >>> TYPE1:C 2.39E-01 5.93E-02 4.022 5.77E-05 *** >>> TYPE2:B -1.82E+00 6.46E-02 -28.178 <2.00E-16 *** >>> A:B -2.26E-01 8.52E-02 -2.649 0.008066 ** >>> TYPE1:C 8.07E-01 6.27E-02 12.87 <2.00E-16 *** >>> TYPE2:C -2.51E+00 7.83E-02 -32.039 <2.00E-16 *** >>> A:C -1.70E-01 9.51E-02 -1.783 0.074512 . >>> B:C -3.01E-01 1.12E-01 -2.698 0.006985 ** >>> TYPE1:D -1.37E-01 9.20E-02 -1.489 0.136548 >>> TYPE2:D -1.13E+00 9.19E-02 -12.329 <2.00E-16 *** >>> A:D -2.11E-01 1.27E-01 -1.655 0.097953 . >>> B:D -2.15E-01 1.55E-01 -1.387 0.165472 >>> C:D -5.51E-01 2.76E-01 -1.997 0.045829 * >>> TYPE1:A:B -2.15E-01 1.17E-01 -1.84 0.065714 >>> . >>> >>> >>> TYPE2:A:B 7.21E-01 1.28E-01 5.635 1.75E-08 >>> *** >>> TYPE1:A:C -3.26E-01 1.24E-01 -2.643 0.008221 >>> ** >>> TYPE2:A:C 9.70E-01 1.53E-01 6.36 2.02E-10 >>> *** >>> TYPE1:B:C -3.21E-01 1.38E-01 -2.321 0.020313 >>> * >>> TYPE2:B:C 1.35E+00 1.89E-01 7.133 9.85E-13 >>> *** >>> A:B:C 1.80E-01 2.11E-01 0.852 0.394425 >>> TYPE1:A:D -1.92E-01 1.83E-01 -1.05 0.293758 >>> TYPE2:A:D 6.76E-01 1.80E-01 3.75 0.000177 >>> *** >>> TYPE1:B:D -3.87E-01 2.16E-01 -1.796 0.072443 >>> . >>> TYPE2:B:D 1.09E+00 2.30E-01 4.709 2.49E-06 >>> *** >>> A:B:D 1.92E-01 2.73E-01 0.702 0.482512 >>> TYPE1:C:D -3.33E-02 3.18E-01 -0.105 0.916465 >>> TYPE2:C:D 1.20E-01 5.05E-01 0.238 0.811914 >>> A:C:D -7.37E+00 1.74E+04 -4.23E-04 0.999663 >>> B:C:D 3.14E-01 4.92E-01 0.638 0.523254 >>> TYPE1:A:B:C 3.21E-01 2.64E-01 1.218 0.223336 >>> TYPE2:A:B:C -8.43E-01 3.59E-01 -2.351 0.018747 >>> * >>> TYPE1:A:B:D 3.27E-01 3.84E-01 0.85 0.3952 >>> TYPE2:A:B:D -6.59E-01 4.08E-01 -1.617 0.105883 >>> TYPE1:A:C:D 7.69E+00 1.74E+04 4.42E-04 0.999648 >>> >>> TYPE2:A:C:D -1.60E+01 3.48E+04 -4.58E-04 0.999634 >>> >>> TYPE1:B:C:D 2.49E-01 5.70E-01 0.437 0.662288 >>> TYPE2:B:C:D -7.08E-01 8.97E-01 -0.789 0.430007 >>> A:B:C:D 9.08E-03 2.47E+04 3.67E-07 1 >>> TYPE1:A:B:C:D -3.30E-01 2.47E+04 -1.34E-05 0.999989 >>> TYPE2:A:B:C:D 1.10E+00 4.94E+04 2.22E-05 0.999982 >>> >> > > -- > Brian D. Ripley, [EMAIL PROTECTED] > Professor of Applied Statistics, > http://www.stats.ox.ac.uk/~ripley/<http://www.stats.ox.ac.uk/%7Eripley/> > University of Oxford, Tel: +44 1865 272861 (self) > 1 South Parks Road, +44 1865 272866 (PA) > Oxford OX1 3TG, UK Fax: +44 1865 272595 > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.