The problem is the use of variable selection without simultaneous shrinkage. It will result in an entirely unreliable model and essentially will choose a random sample of the predictors. See http://www.childrensmercy.org/stats/faq/faq12.asp Frank
Shi, Tao wrote: > > Hi Frank, > > I know it's kind of beyond the scope of R help, but I would appreciate it > if you > can elaborate further on this. Are you worrying about this variable > selection > approach or you think that there is something wrong with the data (it's > actually > a real dataset)? If it's the first one, I believe I can always narrow > down the > variables based on univarate analysis and build a multivariate model from > that. > > Many thanks in advance. > > ...Tao > > > > > ----- Original Message ---- >> From: Frank Harrell <f.harr...@vanderbilt.edu> >> To: r-help@r-project.org >> Sent: Mon, May 16, 2011 11:25:20 AM >> Subject: Re: [R] changes in coxph in "survival" from older version? >> >> Please don't be serious about doing variable selection with this >> dataset. >> Frank >> >> Shi, Tao wrote: >> > >> > Hi Terry, >> > >> > Really appreciate your help! Sorry for my late reply. >> > >> > I did realize that there are way more predictors in the model. My >> initial >> > thinking was use that as an initial model for stepwise model >> selection. >> > Now I >> > wonder if the model selection result is still valid if the initial >> model >> > didn't >> > even converge? >> > >> > Thanks! >> > >> > ...Tao >> > >> > >> > >> > >> > ----- Original Message ---- >> >> From: Terry Therneau <thern...@mayo.edu> >> >> To: "Shi, Tao" <shida...@yahoo.com> >> >> Cc: r-help@r-project.org >> >> Sent: Thu, May 12, 2011 6:42:09 AM >> >> Subject: Re: changes in coxph in "survival" from older version? >> >> >> >> >> >> On Wed, 2011-05-11 at 16:11 -0700, Shi, Tao wrote: >> >> > Hi all, >> >> > >> >> > I found that the two different versions of "survival" packages, >> namely >> >>2.36-5 >> >> >> >> > vs. 2.36-8 or later, give different results for coxph function. >> >> Please see >> > >> >> > below and the data is attached. The second one was done on Linux, >> but >> >>Windows >> >> >> >> > gave the same results. Could you please let me know which one I >> >> should >> >>trust? >> >> > >> >> > Thanks, >> >> >> >> In your case, neither. Your data set has 22 events and 17 >> predictors; >> >> the rule of thumb for a reliable Cox model is 10-20 events per >> predictor >> >> which implies no more than 2 for your data set. As a result, the >> >> coefficients of your model have very wide confidence intervals, the >> coef >> >> for Male for instance has se of 3.26, meaning the CI goes from 1/26 >> to >> >> 26 times the estimate; i.e., there is no biological meaning to the >> >> estimate. >> >> >> >> Nevertheless, why did coxph give a different answer? The later >> >> version 2.36-9 failed to converge (20 iterations) with a final >> >> log-likelihood of -19.94, the earlier code converges in 10 >> iterations to >> >> -19.91. In version 2.36-6 an extra check was put into the maximizer >> for >> >> coxph in response to an exceptional data set which caused the >> routine to >> >> fail due to overflow of the exp function; the Newton-Raphson >> iteration >> >> algorithm had made a terrible guess in it's iteration path, which >> can >> >> happen with all NR based search methods. >> >> I put a limit on the size the linear predictor in the Cox model >> of >> >> 21. The basic argument is that exp(linear-predictor) = relative >> risk >> >> for a subject, and that there is not much biological meaning for >> risks >> >> to be less than exp(-21) ~ 1/(population of the earh). There is >> more to >> >> the reasoning, interested parties should look at the comments in >> >> src/coxsafe.c, a 5 line routine with 25 lines of discussion. I will >> >> happily accept input the "best" value for the constant. >> >> >> >> I never expected to see a data set with both convergence of the >> LL >> >> and linear predictors larger than +-15. Looking at the fit (older >> code) >> >> > round(fit2$linear.predictor, 2) >> >> [1] 2.26 0.89 4.96 -19.09 -12.10 1.39 2.82 3.10 >> >> [9] 18.57 -25.25 22.94 8.75 5.52 -27.64 14.88 -23.41 >> >> [17] 13.70 -28.45 -1.84 10.04 12.62 2.54 6.33 -8.76 >> >> [25] 9.68 4.39 2.92 3.51 6.02 -17.24 5.97 >> >> >> >> This says that, if the model is to be believed, you have several >> near >> >> immortals in the data set. (Everyone else on earth will perish >> first). >> >> >> >> Terry Therneau >> >> >> >> >> >> >> >> >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> ----- >> Frank Harrell >> Department of Biostatistics, Vanderbilt University >> -- >> View this message in context: >>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3527017.html >> >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3529014.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.