Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
On Sun, 5 Sep 2010, st...@wittongilbert.free-online.co.uk wrote: David Winsemius wrote: 1. is glm the right thing to use before I waste my time Yes, but if your outcome variable is binomial then the family argument should be "binomial". (And if you thought it should be poisson, then why below did you use gaussian??? Used gaussian below because it was the example from the docs. Thats not my data, its example data which was not binomial. and 2. how do I interpret the result! Result? What result? I do see any description of your data, nor any code. I didn't provide MY DATA because I thought that would complicate things even further. So I was hoping for some advice on how to interpret the result of the example data so that I could then apply that to my data. I haven't even tried to run my data as I couldn't see what the output of the examples was trying to tell me. However, as you've snipped it because it was not relevant thats useful to know. I often find this problem with the examples in the R doc's they suddenly take a dataset that I have no knowledege of and play with it and produce an 'answer'. The examples are presumably provided to enable me to work through how the code works etc. So what I was hoping for was someone to point to somewhere on-line that documents how to use the function for logistic regression and to explain what all that table of data it spits out actually meant. Someone has VERY KINDLY posted me something off list which I believe helps. I think you need to consult a statistician or someone who has taken the time to read that "statistical mumbo jumbo" you don't want to learn. This mailing list is not set up to be a tutorial site. I have access to stats advice, but I don't (a) want to turn up to them with a pile of paper from R and them say glm() may be the wrong analaysis (b) they don't do R so they can't tell me if I've used R wrongly and (c) I completely expect they'd say which of the values in the table matter since no paper I've ever seen published showed a logistic regression with a table of numbers. Clearly the time to consult a statistician is before you have done any statistical analysis. Frank Harrell I have a couple of Kleinbaum's (et al) other texts and find them to be well written and reasoned, so I suspect the citation above would be as accessible as any. Thank you, that is useful. There is a real problem when buying R text books. None of the bookshops round here stock any which means you can't tell if they are much good. I've looked at some and they seem to be re-writes of the help files. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
Calum-4 wrote: > > Hi I know asking which test to use is frowned upon on this list... so > please do read on for at least a couple on sentences... > > I have some multivariate data slit as follows > > Tumour Site (one of 5 categories) # > Chemo Schedule (one of 3 cats) ## > Cycle (one of 3 cats*) ## > Dose (one of 3 cats*) # > > *These are actually integers but for all our other analysis so far we > have grouped them into logical bands of categories. > > The dependant variable is "Reaction" or "No Reaction" > > I have individually analysed each of the independant variables against > Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ## > produced p values less than 0.05, and those marked # produce p values > close to 0.05. > > We believe that Cycle is the crucial piece of data - the others just > appear to be different because there are more early cycles in certain > groups than others. > > SO - I believe what I need to do is a Linear Logistic Regression on the > 4 independant variables. And I'm expecting it to show that the tumour > site, schedule and dose don't matter, only the cycle matters. Done a lot > of reading and I'm clueless!! > > I think I want to do something like: > > glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson) > = > Comment 1: If you stick to Linear Logistic Regression, the family should > be "binomial" assuming that reaction has only two values (Yes/No). > "family=poisson" should be used when the response is a frequency count > such as the number of tumors. > = > > I am then expecting to see some very long output with lots of numbers... > ...my question is TWO fold - > > 1. is glm the right thing to use before I waste my time > > and 2. how do I interpret the result! (I'm kind of expect a lecture here > as I'm really looking for a nice snappy 'p<0.05 means this variable is > the one having the influence' type answer and I suspect I'm going to be > told thats not possible...! > > Comment 2: The regression coefficients in binary logistic regression > models are called log-odds ratio. The interpretation of odds ratio can be > tricky but the p-value is interpreted in the usual way. > > To be clear the example given in the docs is: > >> library(MASS) > >> data(anorexia) > >> anorex.1<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = >> gaussian, data = anorexia) > > === > Comment 3. Here Postwt is a continuous variable. The specification "family > = gaussian" assumes the that Postwt is a normal variable, therefore, the > fitted model is the ordinary normal linear regression model. > === > > The output of anorex.1 is: > > Call: glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = > gaussian, data = anorexia) > > Coefficients: > > (Intercept)PrewtTreatCont TreatFT > > 49.7711 -0.5655 -4.0971 4.5631 > > Degrees of Freedom: 71 Total (i.e. Null); 68 Residual > > Null Deviance:4525 > > Residual Deviance: 3311 AIC: 490 > > > > and the output of summary(anorex.1) is: > > Call: > > glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, > > data = anorexia) > > Deviance Residuals: > > Min1QMedian3Q Max > > -14.1083 -4.2773 -0.54845.4838 15.2922 > > Coefficients: > > Estimate Std. Error t value Pr(>|t|) > > (Intercept) 49.771113.3910 3.717 0.000410 *** > > Prewt-0.5655 0.1612 -3.509 0.000803 *** > > TreatCont-4.0971 1.8935 -2.164 0.033999 * > > TreatFT 4.5631 2.1333 2.139 0.036035 * > > --- > > Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 > > (Dispersion parameter for gaussian family taken to be 48.69504) > > Null deviance: 4525.4 on 71 degrees of freedom > > Residual deviance: 3311.3 on 68 degrees of freedom > > AIC: 489.97 > > Number of Fisher Scoring iterations: 2 > > > > --- > Either can someone point me to a decent place that would explain what > the means or provide me some pointers? i.e. which of the variables has > the influence on the outcome in the anorexia data? > > Please don't shout!! happy to be pointed to a reference but would prefer > one in common english not some stats mumbo jumbo! > > Calum > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://r.789695.n4.nabble.com/non-zero-exit-status-error-when-install-GenomeGraphs-tp2526950p2527317.html Sent f
Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
On Sep 5, 2010, at 6:06 AM, st...@wittongilbert.free-online.co.uk wrote: David Winsemius wrote: 1. is glm the right thing to use before I waste my time Yes, but if your outcome variable is binomial then the family argument should be "binomial". (And if you thought it should be poisson, then why below did you use gaussian??? Used gaussian below because it was the example from the docs. Thats not my data, its example data which was not binomial. and 2. how do I interpret the result! Result? What result? I do see any description of your data, nor any code. I didn't provide MY DATA because I thought that would complicate things even further. So I was hoping for some advice on how to interpret the result of the example data so that I could then apply that to my data. I haven't even tried to run my data as I couldn't see what the output of the examples was trying to tell me. I didn't think that providing commentary on ols regression results was going to be that germane to setting up and running logistic regression. Why haven't you tried a Google search for tutorials. When I did that I found: http://www.ats.ucla.edu/stat/r/dae/logit.htm Surely there are others. -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
David Winsemius wrote: 1. is glm the right thing to use before I waste my time Yes, but if your outcome variable is binomial then the family argument should be "binomial". (And if you thought it should be poisson, then why below did you use gaussian??? Used gaussian below because it was the example from the docs. Thats not my data, its example data which was not binomial. and 2. how do I interpret the result! Result? What result? I do see any description of your data, nor any code. I didn't provide MY DATA because I thought that would complicate things even further. So I was hoping for some advice on how to interpret the result of the example data so that I could then apply that to my data. I haven't even tried to run my data as I couldn't see what the output of the examples was trying to tell me. However, as you've snipped it because it was not relevant thats useful to know. I often find this problem with the examples in the R doc's they suddenly take a dataset that I have no knowledege of and play with it and produce an 'answer'. The examples are presumably provided to enable me to work through how the code works etc. So what I was hoping for was someone to point to somewhere on-line that documents how to use the function for logistic regression and to explain what all that table of data it spits out actually meant. Someone has VERY KINDLY posted me something off list which I believe helps. I think you need to consult a statistician or someone who has taken the time to read that "statistical mumbo jumbo" you don't want to learn. This mailing list is not set up to be a tutorial site. I have access to stats advice, but I don't (a) want to turn up to them with a pile of paper from R and them say glm() may be the wrong analaysis (b) they don't do R so they can't tell me if I've used R wrongly and (c) I completely expect they'd say which of the values in the table matter since no paper I've ever seen published showed a logistic regression with a table of numbers. I have a couple of Kleinbaum's (et al) other texts and find them to be well written and reasoned, so I suspect the citation above would be as accessible as any. Thank you, that is useful. There is a real problem when buying R text books. None of the bookshops round here stock any which means you can't tell if they are much good. I've looked at some and they seem to be re-writes of the help files. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
On Sep 4, 2010, at 6:53 PM, st...@wittongilbert.free-online.co.uk wrote: Hi I know asking which test to use is frowned upon on this list... so please do read on for at least a couple on sentences... I have some multivariate data slit as follows Tumour Site (one of 5 categories) # Chemo Schedule (one of 3 cats) ## Cycle (one of 3 cats*) ## Dose (one of 3 cats*) # *These are actually integers but for all our other analysis so far we have grouped them into logical bands of categories. The dependant variable is "Reaction" or "No Reaction" I have individually analysed each of the independant variables against Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ## produced p values less than 0.05, and those marked # produce p values close to 0.05. We believe that Cycle is the crucial piece of data - the others just appear to be different because there are more early cycles in certain groups than others. SO - I believe what I need to do is a Linear Logistic Regression on the 4 independant variables. And I'm expecting it to show that the tumour site, schedule and dose don't matter, only the cycle matters. Done a lot of reading and I'm clueless!! I think I want to do something like: glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson) I am then expecting to see some very long output with lots of numbers... ...my question is TWO fold - 1. is glm the right thing to use before I waste my time Yes, but if your outcome variable is binomial then the family argument should be "binomial". (And if you thought it should be poisson, then why below did you use gaussian??? and 2. how do I interpret the result! Result? What result? I do see any description of your data, nor any code. (I'm kind of expect a lecture here as I'm really looking for a nice snappy 'p<0.05 means this variable is the one having the influence' type answer and I suspect I'm going to be told thats not possible...! I think you need to consult a statistician or someone who has taken the time to read that "statistical mumbo jumbo" you don't want to learn. This mailing list is not set up to be a tutorial site. (Re your request below: Some years ago I saw one of those "programmed learning" texts by Kleinbaum on logistic regression. Maybe you could read it and see if it makes your consulting sessions go more smoothly.) http://www.bookfinder.com/search/?author=kleinbaum&title=logistic+regression&lang=en&isbn=&submit=Begin+search&new_used=*&destination=us¤cy=USD&mode=basic&st=sr&ac=qr I have a couple of Kleinbaum's (et al) other texts and find them to be well written and reasoned, so I suspect the citation above would be as accessible as any. To be clear the example given in the docs is: library(MASS) --- Either can someone point me to a decent place that would explain what the means or provide me some pointers? i.e. which of the variables has the influence on the outcome in the anorexia data? Please don't shout!! happy to be pointed to a reference but would prefer one in common english not some stats mumbo jumbo! Calum -- David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Linear Logistic Regression - Understanding the output (and possibly the test to use!)
Hi I know asking which test to use is frowned upon on this list... so please do read on for at least a couple on sentences... I have some multivariate data slit as follows Tumour Site (one of 5 categories) # Chemo Schedule (one of 3 cats) ## Cycle (one of 3 cats*) ## Dose (one of 3 cats*) # *These are actually integers but for all our other analysis so far we have grouped them into logical bands of categories. The dependant variable is "Reaction" or "No Reaction" I have individually analysed each of the independant variables against Reaction/No Reaction using ChiSq and Fisher Tests. Those marked ## produced p values less than 0.05, and those marked # produce p values close to 0.05. We believe that Cycle is the crucial piece of data - the others just appear to be different because there are more early cycles in certain groups than others. SO - I believe what I need to do is a Linear Logistic Regression on the 4 independant variables. And I'm expecting it to show that the tumour site, schedule and dose don't matter, only the cycle matters. Done a lot of reading and I'm clueless!! I think I want to do something like: glm (reaction ~ site + sched + cycle + dose, data=mydata, family=poisson) I am then expecting to see some very long output with lots of numbers... ...my question is TWO fold - 1. is glm the right thing to use before I waste my time and 2. how do I interpret the result! (I'm kind of expect a lecture here as I'm really looking for a nice snappy 'p<0.05 means this variable is the one having the influence' type answer and I suspect I'm going to be told thats not possible...! To be clear the example given in the docs is: library(MASS) data(anorexia) anorex.1<- glm(Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) The output of anorex.1 is: Call: glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) Coefficients: (Intercept)PrewtTreatCont TreatFT 49.7711 -0.5655 -4.0971 4.5631 Degrees of Freedom: 71 Total (i.e. Null); 68 Residual Null Deviance:4525 Residual Deviance: 3311 AIC: 490 and the output of summary(anorex.1) is: Call: glm(formula = Postwt ~ Prewt + Treat + offset(Prewt), family = gaussian, data = anorexia) Deviance Residuals: Min1QMedian3Q Max -14.1083 -4.2773 -0.54845.4838 15.2922 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 49.771113.3910 3.717 0.000410 *** Prewt-0.5655 0.1612 -3.509 0.000803 *** TreatCont-4.0971 1.8935 -2.164 0.033999 * TreatFT 4.5631 2.1333 2.139 0.036035 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 48.69504) Null deviance: 4525.4 on 71 degrees of freedom Residual deviance: 3311.3 on 68 degrees of freedom AIC: 489.97 Number of Fisher Scoring iterations: 2 --- Either can someone point me to a decent place that would explain what the means or provide me some pointers? i.e. which of the variables has the influence on the outcome in the anorexia data? Please don't shout!! happy to be pointed to a reference but would prefer one in common english not some stats mumbo jumbo! Calum __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.