On 28 May 2015, at 00:06 , Kengo Inagaki <kengoing...@gmail.com> wrote:
> I did not understand complete separation quite well.. > Thank you very much for clarification. > > Kengo > > 2015-05-27 17:03 GMT-05:00 David Winsemius <dwinsem...@comcast.net>: >> >> On May 27, 2015, at 3:00 PM, Kengo Inagaki wrote: >> >>> Here is the result- >>> >>>> with(a, table(Sex, Therapy1, Outcome) ) >>> , , Outcome = Alive >>> >>> Therapy1 >>> Sex no yes >>> female 0 4 >>> male 4 5 >>> >>> , , Outcome = Death >>> >>> Therapy1 >>> Sex no yes >>> female 6 3 >>> male 3 0 >> >> So no deaths when Female had no-Therapy1 and no survivors with the opposite >> for those variables. Complete separation. Actually not quite complete separation, but just as bad. If you look at the linear combination Sex + Therapy, you get 0 (female, no therapy) 1 (female, therapy OR male, no therapy 2 (male, therapy) 0: 6 dead, 0 survive 1: 6 dead, 8 survive 2: 0 dead, 5 survive and any logistic curve through (1, log(6/8)) fits the middle point and the other two will be fitted better and better as the curve gets steeper, so the fit diverges. That's a general pattern: you can have complete separation except at one point and still get divergence. Similarly (and really just the same), if you have multiple regression with k parameters and there's a k-1 dimensional hyperplane in predictor space with all responses 0 on one side and 1 on the other, but possibly both 0 and 1 _on_ the hyperplane. Google tells me that this is called quasicomplete separation. -pd >> >> -- >> David. >> >>> >>> >>> 2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsem...@comcast.net>: >>>> >>>> On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote: >>>> >>>>> Thank you very much for your rapid response. I sincerely appreciate your >>>>> input. >>>>> I am sorry for sending the previous email in HTML format. >>>>> >>>>> with(a, table(Sex, Therapy1) ) shows the following. >>>>> Therapy1 >>>>> Sex no yes >>>>> female 6 7 >>>>> male 7 5 >>>>> >>>>> and with(a, table(Therapy1, Outcome) ) >>>>> elicit the following >>>>> >>>>> Outcome >>>>> Sex Alive Death >>>>> female 4 9 >>>>> male 9 3 >>>>> >>>>> Outcome >>>>> Therapy1 Alive Death >>>>> no 4 9 >>>>> yes 9 3 >>>> >>>> Then what about: >>>> >>>> with(a, table(Sex, Therapy1, Outcome) ) >>>> >>>> -- >>>> David >>>> >>>> >>>>> >>>>> As there is no zero cells, it does not seem to be complete separation. >>>>> I really appreciate comments. >>>>> >>>>> Kengo Inagaki >>>>> Memphis, TN >>>>> >>>>> >>>>> 2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsem...@comcast.net>: >>>>>> >>>>>> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote: >>>>>> >>>>>>> I am currently working on a health care related project using R. I am >>>>>>> learning R while working on data analysis. >>>>>>> >>>>>>> Below is the part of the data in which i am encountering a problem. >>>>>>> >>>>>>> >>>>>>> Case# Sex Therapy1 Therapy2 Outcome >>>>>>> >>>>>>> 1 male no >>>>>>> no Alive >>>>>>> >>>>>> >>>>>> snipped mangled data sent in HTML >>>>>> >>>>>>> >>>>>>> >>>>>>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" are >>>>>>> predictor variables. >>>>>>> >>>>>>> All of the predictors are significantly associated with the outcome by >>>>>>> univariate analysis. >>>>>>> >>>>>>> Logistic regression runs fine with most of the predictors when "Sex" and >>>>>>> "Therapy1" are not included at the same time (This is a part of table >>>>>>> that >>>>>>> I cut out from a larger table for ease of >>>>>>> >>>>>>> presentation and there are more predictors that i tested). >>>>>> >>>>>> Please examine the data before reaching for ridge regression: >>>>>> >>>>>> What does this show: ... >>>>>> >>>>>> with(a, table(Sex, Therapy1) ) >>>>>> >>>>>> I predict you will see a zero cell entry. The read about "complete >>>>>> separation" and the so-called "Hauck-Donner effect". >>>>>> >>>>>> -- >>>>>> David. >>>>>>> >>>>>>> However, when "Sex" and "Therapy1" are included in logistic regression >>>>>>> model at the same time, standard error inflates and p value gets close >>>>>>> to 1. >>>>>>> >>>>>>> The formula used is, >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a >>>>>>> vector "a" to represent above table. >>>>>>> >>>>>>> >>>>>>> >>>>>>> After doing some reading, I suspect this might be collinearity, as vif >>>>>>> values (using "vif()" function in car package) were sky high (8,875,841 >>>>>>> for >>>>>>> both "Sex" and "Therapy1"). >>>>>>> >>>>>>> Learning that ridge regression may be a solution, I attempted using >>>>>>> logisticRidge {ridge} using the following formula, but i get the >>>>>>> accomapnying error message. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Error in ifelse(y, log(p), log(1 - p)) : >>>>>>> >>>>>>> invalid to change the storage mode of a factor >>>>>>> >>>>>>> >>>>>>> >>>>>>> At this point I do not have an idea how to solve this and would like to >>>>>>> seek help. >>>>>>> >>>>>>> I really really appreciate your input!!! >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>> >>>>>> >>>>>> David Winsemius >>>>>> Alameda, CA, USA >>>>>> >>>> >>>> David Winsemius >>>> Alameda, CA, USA >>>> >> >> David Winsemius >> Alameda, CA, USA >> > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.