Dr. Dalgaard, Thank you for further clarifying the problem. I found a few possible solutions on internet, and will try to find the solution.
This was my first time to post questions on this mailing list, and I learned quite a bit though working on this problem. I apologize for any impoliteness you may have noticed. Best regards, Kengo 2015-05-28 4:26 GMT-05:00 peter dalgaard <pda...@gmail.com>: > > On 28 May 2015, at 00:06 , Kengo Inagaki <kengoing...@gmail.com> wrote: > >> I did not understand complete separation quite well.. >> Thank you very much for clarification. >> >> Kengo >> >> 2015-05-27 17:03 GMT-05:00 David Winsemius <dwinsem...@comcast.net>: >>> >>> On May 27, 2015, at 3:00 PM, Kengo Inagaki wrote: >>> >>>> Here is the result- >>>> >>>>> with(a, table(Sex, Therapy1, Outcome) ) >>>> , , Outcome = Alive >>>> >>>> Therapy1 >>>> Sex no yes >>>> female 0 4 >>>> male 4 5 >>>> >>>> , , Outcome = Death >>>> >>>> Therapy1 >>>> Sex no yes >>>> female 6 3 >>>> male 3 0 >>> >>> So no deaths when Female had no-Therapy1 and no survivors with the opposite >>> for those variables. Complete separation. > > > Actually not quite complete separation, but just as bad. If you look at the > linear combination Sex + Therapy, you get > > 0 (female, no therapy) > 1 (female, therapy OR male, no therapy > 2 (male, therapy) > > > 0: 6 dead, 0 survive > 1: 6 dead, 8 survive > 2: 0 dead, 5 survive > > and any logistic curve through (1, log(6/8)) fits the middle point and the > other two will be fitted better and better as the curve gets steeper, so the > fit diverges. > > That's a general pattern: you can have complete separation except at one > point and still get divergence. Similarly (and really just the same), if you > have multiple regression with k parameters and there's a k-1 dimensional > hyperplane in predictor space with all responses 0 on one side and 1 on the > other, but possibly both 0 and 1 _on_ the hyperplane. Google tells me that > this is called quasicomplete separation. > > -pd > >>> >>> -- >>> David. >>> >>>> >>>> >>>> 2015-05-27 16:57 GMT-05:00 David Winsemius <dwinsem...@comcast.net>: >>>>> >>>>> On May 27, 2015, at 2:49 PM, Kengo Inagaki wrote: >>>>> >>>>>> Thank you very much for your rapid response. I sincerely appreciate your >>>>>> input. >>>>>> I am sorry for sending the previous email in HTML format. >>>>>> >>>>>> with(a, table(Sex, Therapy1) ) shows the following. >>>>>> Therapy1 >>>>>> Sex no yes >>>>>> female 6 7 >>>>>> male 7 5 >>>>>> >>>>>> and with(a, table(Therapy1, Outcome) ) >>>>>> elicit the following >>>>>> >>>>>> Outcome >>>>>> Sex Alive Death >>>>>> female 4 9 >>>>>> male 9 3 >>>>>> >>>>>> Outcome >>>>>> Therapy1 Alive Death >>>>>> no 4 9 >>>>>> yes 9 3 >>>>> >>>>> Then what about: >>>>> >>>>> with(a, table(Sex, Therapy1, Outcome) ) >>>>> >>>>> -- >>>>> David >>>>> >>>>> >>>>>> >>>>>> As there is no zero cells, it does not seem to be complete separation. >>>>>> I really appreciate comments. >>>>>> >>>>>> Kengo Inagaki >>>>>> Memphis, TN >>>>>> >>>>>> >>>>>> 2015-05-27 13:57 GMT-05:00 David Winsemius <dwinsem...@comcast.net>: >>>>>>> >>>>>>> On May 27, 2015, at 10:10 AM, Kengo Inagaki wrote: >>>>>>> >>>>>>>> I am currently working on a health care related project using R. I am >>>>>>>> learning R while working on data analysis. >>>>>>>> >>>>>>>> Below is the part of the data in which i am encountering a problem. >>>>>>>> >>>>>>>> >>>>>>>> Case# Sex Therapy1 Therapy2 Outcome >>>>>>>> >>>>>>>> 1 male no >>>>>>>> no Alive >>>>>>>> >>>>>>> >>>>>>> snipped mangled data sent in HTML >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> "Outcome" is the response variable and "Sex", "Therapy1", "Therapy2" >>>>>>>> are >>>>>>>> predictor variables. >>>>>>>> >>>>>>>> All of the predictors are significantly associated with the outcome by >>>>>>>> univariate analysis. >>>>>>>> >>>>>>>> Logistic regression runs fine with most of the predictors when "Sex" >>>>>>>> and >>>>>>>> "Therapy1" are not included at the same time (This is a part of table >>>>>>>> that >>>>>>>> I cut out from a larger table for ease of >>>>>>>> >>>>>>>> presentation and there are more predictors that i tested). >>>>>>> >>>>>>> Please examine the data before reaching for ridge regression: >>>>>>> >>>>>>> What does this show: ... >>>>>>> >>>>>>> with(a, table(Sex, Therapy1) ) >>>>>>> >>>>>>> I predict you will see a zero cell entry. The read about "complete >>>>>>> separation" and the so-called "Hauck-Donner effect". >>>>>>> >>>>>>> -- >>>>>>> David. >>>>>>>> >>>>>>>> However, when "Sex" and "Therapy1" are included in logistic regression >>>>>>>> model at the same time, standard error inflates and p value gets close >>>>>>>> to 1. >>>>>>>> >>>>>>>> The formula used is, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Model<-glm(Outcome~Sex+Therapy1,data=a,family=binomial) #I assigned a >>>>>>>> vector "a" to represent above table. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> After doing some reading, I suspect this might be collinearity, as vif >>>>>>>> values (using "vif()" function in car package) were sky high >>>>>>>> (8,875,841 for >>>>>>>> both "Sex" and "Therapy1"). >>>>>>>> >>>>>>>> Learning that ridge regression may be a solution, I attempted using >>>>>>>> logisticRidge {ridge} using the following formula, but i get the >>>>>>>> accomapnying error message. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> logisticRidge(a$Outcome~a$Sex+a$Therapy1) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Error in ifelse(y, log(p), log(1 - p)) : >>>>>>>> >>>>>>>> invalid to change the storage mode of a factor >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> At this point I do not have an idea how to solve this and would like to >>>>>>>> seek help. >>>>>>>> >>>>>>>> I really really appreciate your input!!! >>>>>>>> >>>>>>>> [[alternative HTML version deleted]] >>>>>>>> >>>>>>> >>>>>>> >>>>>>> David Winsemius >>>>>>> Alameda, CA, USA >>>>>>> >>>>> >>>>> David Winsemius >>>>> Alameda, CA, USA >>>>> >>> >>> David Winsemius >>> Alameda, CA, USA >>> >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > -- > Peter Dalgaard, Professor, > Center for Statistics, Copenhagen Business School > Solbjerg Plads 3, 2000 Frederiksberg, Denmark > Phone: (+45)38153501 > Office: A 4.23 > Email: pd....@cbs.dk Priv: pda...@gmail.com > > > > > > > > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.