Re: [R] changes in coxph in "survival" from older version?

Frank Harrell Tue, 17 May 2011 05:20:14 -0700

The problem is the use of variable selection without simultaneous shrinkage. 
It will result in an entirely unreliable model and essentially will choose a
random sample of the predictors.  See
http://www.childrensmercy.org/stats/faq/faq12.asp
Frank


Shi, Tao wrote:
> 
> Hi Frank,
> 
> I know it's kind of beyond the scope of R help, but I would appreciate it
> if you 
> can elaborate further on this.  Are you worrying about this variable
> selection 
> approach or you think that there is something wrong with the data (it's
> actually 
> a real dataset)?  If it's the first one, I believe I can always narrow
> down the 
> variables based on univarate analysis and build a multivariate model from
> that.
> 
> Many thanks in advance.
> 
> ...Tao
> 
> 
> 
> 
> ----- Original Message ----
>> From: Frank Harrell &lt;f.harr...@vanderbilt.edu&gt;
>> To: r-help@r-project.org
>> Sent: Mon, May 16, 2011 11:25:20 AM
>> Subject: Re: [R] changes in coxph in "survival" from older version?
>> 
>> Please don't be serious about doing variable selection with this 
>> dataset.
>> Frank
>> 
>> Shi, Tao wrote:
>> > 
>> > Hi Terry,
>> > 
>> > Really appreciate your help!  Sorry for my late reply.
>> > 
>> > I did realize that there are way more predictors in the model.  My 
>> initial 
>> > thinking was use that as an initial model for stepwise model 
>> selection. 
>> > Now I 
>> > wonder if the model selection result is still  valid if the initial
>> model
>> > didn't 
>> > even converge?
>> > 
>> > Thanks!
>> > 
>> > ...Tao
>> > 
>> > 
>> > 
>> > 
>> > ----- Original Message ----
>> >> From: Terry Therneau &lt;thern...@mayo.edu&gt;
>> >> To:  "Shi, Tao" &lt;shida...@yahoo.com&gt;
>> >> Cc: r-help@r-project.org
>> >> Sent:  Thu, May 12, 2011 6:42:09 AM
>> >> Subject: Re: changes in coxph in  "survival" from older version?
>> >> 
>> >> 
>> >> On Wed,  2011-05-11 at 16:11 -0700, Shi, Tao wrote:
>> >> > Hi all,
>> >>  > 
>> >> > I found that the two different versions of "survival"  packages,
>> namely  
>> >>2.36-5 
>> >>
>> >> > vs.  2.36-8 or later, give different results for coxph  function. 
>> >>  Please see 
>> > 
>> >> > below and the data is attached.   The  second one was done on Linux,
>> but 
>> >>Windows 
>> >>
>> >> > gave the same results.   Could you please let  me know which one I
>> >> should 
>> >>trust?
>> >> > 
>> >> >  Thanks,
>> >> 
>> >>  In your case,  neither.  Your data set has 22 events and 17 
>> predictors;
>> >>  the rule of thumb for a reliable Cox model is 10-20 events per  
>> predictor
>> >> which implies no more than 2 for your data set.  As a  result,  the
>> >> coefficients of your model have very wide  confidence intervals, the 
>> coef
>> >> for Male for instance has se of  3.26, meaning the CI goes from 1/26 
>> to
>> >> 26 times the estimate;  i.e., there is no biological meaning to  the
>> >>  estimate.
>> >> 
>> >>   Nevertheless, why did coxph give a  different  answer?  The later
>> >> version 2.36-9 failed to  converge (20 iterations)  with a final
>> >> log-likelihood of  -19.94, the earlier code converges in 10 
>> iterations to
>> >>  -19.91.  In version 2.36-6 an extra check was put into the  maximizer 
>> for
>> >> coxph in response to an exceptional data set which caused  the 
>> routine to
>> >> fail due to overflow of the exp function; the  Newton-Raphson 
>> iteration
>> >> algorithm had made a terrible guess  in it's iteration path, which 
>> can
>> >> happen with all NR based  search methods.  
>> >>    I put a limit  on the size  the linear predictor in the Cox model
>> of
>> >> 21.  The basic   argument is that exp(linear-predictor) = relative
>> risk
>> >> for a  subject, and  that there is not much biological meaning for 
>> risks
>> >> to be less than exp(-21)  ~ 1/(population of the  earh).  There is
>> more to
>> >> the reasoning,  interested  parties should look at the comments in
>> >> src/coxsafe.c, a 5 line   routine with 25 lines of discussion.  I will
>> >> happily accept  input the  "best" value for the constant.
>> >> 
>> >>     I never expected to see a data set  with both convergence of the 
>> LL
>> >> and linear predictors larger than +-15.   Looking at the fit  (older
>> code)
>> >> > round(fit2$linear.predictor, 2)
>> >>    [1]   2.26   0.89   4.96 -19.09 -12.10   1.39     2.82   3.10
>> >>  [9]  18.57 -25.25  22.94    8.75   5.52  -27.64  14.88 -23.41
>> >> [17]  13.70  -28.45  -1.84   10.04  12.62   2.54   6.33   -8.76
>> >> [25]   9.68    4.39   2.92    3.51   6.02 -17.24   5.97
>> >> 
>> >> This says   that, if the model is to be believed, you have several
>> near
>> >>  immortals in the  data set. (Everyone else on earth will perish 
>> first).
>> >> 
>> >> Terry  Therneau
>> >> 
>> >> 
>> >> 
>> >>
>> > 
>> >  ______________________________________________
>> > R-help@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the  posting guide
>> > http://www.R-project.org/posting-guide.html
>> > and  provide commented, minimal, self-contained, reproducible code.
>> > 
>> 
>> 
>> -----
>> Frank Harrell
>> Department of Biostatistics, Vanderbilt  University
>> --
>> View this message in context:  
>>http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3527017.html
>>
>> Sent  from the R help mailing list archive at Nabble.com.
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting  guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented,  minimal, self-contained, reproducible code.
>>
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 


-----
Frank Harrell
Department of Biostatistics, Vanderbilt University
--
View this message in context: 
http://r.789695.n4.nabble.com/changes-in-coxph-in-survival-from-older-version-tp3516101p3529014.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] changes in coxph in "survival" from older version?

Reply via email to