Re: [R] Cox ridge regression

2009-08-03 Thread Terry Therneau
Question 1. Consider the following example from help(ridge):

 fit1 - coxph(Surv(futime, fustat) ~ rx + ridge(age, ecog.ps, theta=1), 
ovarian)

As I understand, this builds a model in which `rx' is the predictor,
whereas ridge penalty term contains variables `age' and
`ph.ecog'. Could someone explain what it me...

  The ridge term introduces age as a predictor AND penalizes it.  The model 
above has 3 predictors,  2 of them penalized.
  
  Later in the post you have a model with both age and ridge(age).  This puts 
age in the model twice, once as a free parameter and once as a penalized one.  
Not surprisingly, the second ends up with a coefficient of 0 (within machine 
precision of zero).   The warning message you got about NaN is likely related 
to 
this, that there are redundant terms in the model.
  
Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox ridge regression

2009-08-03 Thread ljubomir

Thank you Terry, that answered all questions.

As a suggestion, help page for ridge() might indicate that the ridge
term simultaneously introduces predictors and penalizes them.

Ljubomir


From: Terry Therneau thern...@mayo.edu
To: Ljubomir Buturovic ljubo...@sfsu.edu
Cc: r-help@r-project.org
Subject: Re: Cox ridge regression
Date: Mon, 3 Aug 2009 09:20:42 -0500 (CDT)


Question 1. Consider the following example from help(ridge):


fit1 - coxph(Surv(futime, fustat) ~ rx + ridge(age, ecog.ps,  
theta=1), ovarian)



As I understand, this builds a model in which `rx' is the predictor,
whereas ridge penalty term contains variables `age' and
`ph.ecog'. Could someone explain what it me...


  The ridge term introduces age as a predictor AND penalizes it.  The model
above has 3 predictors,  2 of them penalized.

  Later in the post you have a model with both age and ridge(age).  This puts
age in the model twice, once as a free parameter and once as a penalized one.
Not surprisingly, the second ends up with a coefficient of 0 (within machine
precision of zero).   The warning message you got about NaN is likely  
related to

this, that there are redundant terms in the model.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox ridge regression

2009-08-01 Thread Ljubomir Buturovic

Hello,

I have questions regarding penalized Cox regression using survival
package (functions coxph() and ridge()). I am using R 2.8.0 on Ubuntu
Linux and survival package version 2.35-4.

Question 1. Consider the following example from help(ridge):

 fit1 - coxph(Surv(futime, fustat) ~ rx + ridge(age, ecog.ps, theta=1), 
 ovarian)

As I understand, this builds a model in which `rx' is the predictor,
whereas ridge penalty term contains variables `age' and
`ph.ecog'. Could someone explain what it means to regularize on
parameters which are not part of the model?  Based on definition of
Cox ridge regression (see for example [1]), or any other regularized
regression, the penalty term is a function of the coefficients
corresponding to the predictor variables, and nothing else.

Question 2. Consider a similar example:

 library(survival)
 lfit2 - coxph(Surv(time, status) ~ age+ph.ecog + ridge(age, ph.ecog, 
 theta=1), cancer)
 print(lfit2)
Call:
coxph(formula = Surv(time, status) ~ age + ph.ecog + ridge(age, 
ph.ecog, theta = 1), data = cancer)

   coef se(coef) se2  Chisq DF p   
age1.13e-02 0.1119.32e-03 0.01  1  0.92
ph.ecog4.43e-01 1.3981.16e-01 0.10  1  0.75
ridge(age) 2.60e-21 0.1104.85e-17 0.00  1  1.00
ridge(ph.ecog) 5.14e-22 1.393 0.00  1  1.00

Iterations: 1 outer, 3 Newton-Raphson
Degrees of freedom for terms= 0 0 0 
Likelihood ratio test=19.1  on 0.01 df, p=3.54e-08
  n=227 (1 observation deleted due to missingness)
Warning message:
In sqrt((diag(x$var2))[kk]) : NaNs produced

What is the meaning of the ridge(age) and ridge(ph.ecog) coefficients?
Again, based on the definition of Cox ridge regression, it simply adds
a penalty term to the standard Cox regression function, and doesn't
introduce any new predictors. What to make of the ridge(age) and
ridge(ph.ecog) rows in the output?

Question 3. What is the origin and significance of the warning in the
previous example:

Warning message:
In sqrt((diag(x$var2))[kk]) : NaNs produced

Thank you very much for your help,

Ljubomir

[1] Bovelstad et al., Predicting survival from microarray data - a
comparative study (Bioinformatics, Vol. 23, no. 16, 2007,
pp. 2080-2087).

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.