Re: [R] cox model

2018-11-03 Thread David Winsemius
It's also "well described" in the help materials for the obvious 
recommended package that ships with every copy of R. My copy sits at 
http://127.0.0.1:29434/library/survival/doc/timedep.pdf. Therneau's S 
package was first ported to R by Thomas Lumley and later Therneau took 
over maintenance.


--

David.

On 11/3/18 12:51 PM, Medic wrote:

I need a R-code for a situation that is well described in the sas help. I
would be very grateful for the help!
"Time-dependent variables can be used to model the effects of subjects
transferring from one treatment group to another. One example of the need
for such strategies is the Stanford heart transplant program. Patients are
accepted if physicians judge them suitable for heart transplant. Then, when
a donor becomes available, physicians choose transplant recipients
according to various medical criteria. A patient’s status can be changed
during the study from waiting for a transplant to being a transplant
recipient. Transplant status can be defined by the time-dependent covariate
function z=z(t) as:
z(t)= 0 (if the patient has not received the transplant at time t)
and 1 (if has received)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cox model

2018-11-03 Thread Jeff Newmiller
Stop re-posting this question.  It only irritates people... it does not improve 
your chances of getting help.

What does improve your chances is reading the Posting Guide and following the 
advice given there. Your question amounts to asking someone to figure out what 
theory you should apply to a problem domain... which is not really on topic 
here, though someone might recommend something to you anyway.

The better strategy would be to seek out (elsewhere) what textbook/papers you 
want to apply (SAS should suggest references if they claim to implement 
solutions) and then use Google or rseek.org to find contributed packages that 
implement some or all of the steps in that approach. You can also peruse the 
Task Views on CRAN. When you get error messages trying to apply those tools in 
R, ask here how to resolve those problems, since this list is about the 
language not your problem domain.

On November 3, 2018 12:51:23 PM PDT, Medic  wrote:
>I need a R-code for a situation that is well described in the sas help.
>I
>would be very grateful for the help!
>"Time-dependent variables can be used to model the effects of subjects
>transferring from one treatment group to another. One example of the
>need
>for such strategies is the Stanford heart transplant program. Patients
>are
>accepted if physicians judge them suitable for heart transplant. Then,
>when
>a donor becomes available, physicians choose transplant recipients
>according to various medical criteria. A patient’s status can be
>changed
>during the study from waiting for a transplant to being a transplant
>recipient. Transplant status can be defined by the time-dependent
>covariate
>function z=z(t) as:
>z(t)= 0 (if the patient has not received the transplant at time t)
>and 1 (if has received)
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] cox model

2018-11-03 Thread David Winsemius
It's also "well described" in the help materials for the obvious 
recommended package that ships with every copy of R. My copy sits at 
http://127.0.0.1:29434/library/survival/doc/timedep.pdf. Therneau's S 
package was first ported to R by Thomas Lumley and later Therneau took 
over maintenance.


--

David.

On 11/3/18 12:51 PM, Medic wrote:

I need a R-code for a situation that is well described in the sas help. I
would be very grateful for the help!
"Time-dependent variables can be used to model the effects of subjects
transferring from one treatment group to another. One example of the need
for such strategies is the Stanford heart transplant program. Patients are
accepted if physicians judge them suitable for heart transplant. Then, when
a donor becomes available, physicians choose transplant recipients
according to various medical criteria. A patient’s status can be changed
during the study from waiting for a transplant to being a transplant
recipient. Transplant status can be defined by the time-dependent covariate
function z=z(t) as:
z(t)= 0 (if the patient has not received the transplant at time t)
and 1 (if has received)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cox model

2018-11-03 Thread Medic
I need a R-code for a situation that is well described in the sas help. I
would be very grateful for the help!
"Time-dependent variables can be used to model the effects of subjects
transferring from one treatment group to another. One example of the need
for such strategies is the Stanford heart transplant program. Patients are
accepted if physicians judge them suitable for heart transplant. Then, when
a donor becomes available, physicians choose transplant recipients
according to various medical criteria. A patient’s status can be changed
during the study from waiting for a transplant to being a transplant
recipient. Transplant status can be defined by the time-dependent covariate
function z=z(t) as:
z(t)= 0 (if the patient has not received the transplant at time t)
and 1 (if has received)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cox model

2018-11-02 Thread post .
I need a R-code for a situation that is well described in the sas help. I
would be very grateful for the help!
"Time-dependent variables can be used to model the effects of subjects
transferring from one treatment group to another. One example of the need
for such strategies is the Stanford heart transplant program. Patients are
accepted if physicians judge them suitable for heart transplant. Then, when
a donor becomes available, physicians choose transplant recipients
according to various medical criteria. A patient’s status can be changed
during the study from waiting for a transplant to being a transplant
recipient. Transplant status can be defined by the time-dependent covariate
function z=z(t) as:
z(t)= 0 (if the patient has not received the transplant at time t)
and 1 (if has received)


Без
вирусов. www.avast.ru

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] cox model

2018-11-02 Thread post .
I need a R-code for a situation that is well described in the sas help. I
would be very grateful for the help!
"Time-dependent variables can be used to model the effects of subjects
transferring from one treatment group to another. One example of the need
for such strategies is the Stanford heart transplant program. Patients are
accepted if physicians judge them suitable for heart transplant. Then, when
a donor becomes available, physicians choose transplant recipients
according to various medical criteria. A patient’s status can be changed
during the study from waiting for a transplant to being a transplant
recipient. Transplant status can be defined by the time-dependent covariate
function z=z(t) as:
z(t)= 0 (if the patient has not received the transplant at time t)
and 1 (if has received)



Без
вирусов. www.avast.ru

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model with time-dependent coefficients

2017-08-11 Thread Theresa Grimm
Dear R-help Community,

I'm currently struggling with some issues extending the proportional Cox model 
with time-dependent coefficients and could really need some help.

Since I'm not experienced in adding code in an email in a nice way I add the 
link to my question and code: 

https://stats.stackexchange.com/questions/297052/time-dependend-coefficients-in-a-cox-model

Essentially, I try to predict the time to default in a credit scoring context 
using six categorical variables with survival analysis.

To extend the basic proportional Cox model I introduce time-dependent 
coefficients using a step function beta(t) according to Therneau et al. "Using 
time depended covariates and time depended coefficients in the Cox model".  
First question is: how to get the overall test for each categorical variable 
after applying the Cox.zph function.

The second, and currently more important question for me is:
After fitting my model based on a training set is it possible to make a 
prediction based on a new testset? Since I use an interaction between the 
covariate BS24 and the stratification by time group on the right handside of my 
formula (see code in link), I'm a bit confused how prediction is going to work. 
At the initial time t=0 I have no idea of the future particularly in which time 
group the new customer will be. 
To compare my models I use the time depended AUC calculated by the "score" 
function from package riskRegression. It internally uses a prediction function 
but when applying it to the extended model and the new test set it says 
"strata(time group) not found". 

When splitting my test set with survSplit the same way I did for the training 
set, it works and I get a nice AUC plot. But I am afraid if this actually makes 
sense and I don't wrongly include future information I usually don't have.

I'm thankful for any help or hint.


Kind regards
Theresa

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model with multiple events - PH assumption

2014-12-23 Thread Therneau, Terry M., Ph.D.



On 12/23/2014 05:00 AM, r-help-requ...@r-project.org wrote:

Dear all,

I'm using the package survival for adjusting the Cox model with multiple
events (Prentice, Williams and Peterson Model). I have several covariates,
some of them are time-dependent.

I'm using the functioncox.zph to check the proportional hazards. Due to
the nature of the time-dependent covariates, I don't need to analyse the
assumptions of the proportional hazards associated with the
time-dependent covariates. Is it right?

?Thanks for your attention.

Best regards,
Helena.


Wrong.
The PH assumption is that the same coefficient b for a covariate applies over all 
follow-up time.  The fact that the covariate itself changes value, or does not change 
value, over time has no bearing on whether the assumption is true.


Now it may often be the case that risk is related to current covariate values, and a Cox 
model using baseline values fails just because the covariates are out of date.  So PH 
might hold for the updated (time-dependent) covariates and fail when using baseline 
values.  This is a very study specific situation, however.


Terry T.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model with multiple events - Proportional Hazards Assumption

2014-12-22 Thread Maria Helena Mourino Silva Nunes
Dear all,

I'm using the package survival for adjusting the Cox model with multiple
events (Prentice, Williams and Peterson Model). I have several covariates,
some of them are time-dependent.

I'm using the functioncox.zph to check the proportional hazards. Due to
the nature of the time-dependent covariates, I don't need to analyse the
assumptions of the proportional hazards associated with the
time-dependent covariates. Is it right?

​Thanks for your attention.

Best regards,
Helena.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model -missing data.

2014-12-19 Thread aoife doherty
Hi all,

I have a data set like this:

Test.cox file:

V1V2 V3   Survival   Event
ann  13  WTHomo   41
ben  20  *51
tom  40  Variant  61


where * indicates that I don't know what the value is for V3 for Ben.

I've set up a Cox model to run like this:

#!/usr/bin/Rscript
library(bdsmatrix)
library(kinship2)
library(survival)
library(coxme)
death.dat - read.table(Test.cox,header=T)
deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
sink(Test.cox.R.Output)
Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
strata(factor(V2)) + factor(V3)) +
(1|ID),data=death.dat,varlist=deathdat.kmat)
Model
sink()



As you can see from the Test.cox file, I have a missing value *. How and
where do I tell the R script treat * as a missing variable. If I can't
incorporate missing values into the model, I assume the alternative is to
remove all of the rows with missing data, which will greatly reduce my data
set, as most rows have at least one missing variable.

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model -missing data.

2014-12-19 Thread Shouro Dasgupta
First recode the *  in NA: death.dat$v3[death.dat$v1==*] - NA

Include this in your model: na.rm=TRUE

Or you could create a new dataset: newdata - na.omit(death.dat)

Shouro




On Fri, Dec 19, 2014 at 11:12 AM, aoife doherty aoife.m.dohe...@gmail.com
wrote:

 Hi all,

 I have a data set like this:

 Test.cox file:

 V1V2 V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  *51
 tom  40  Variant  61


 where * indicates that I don't know what the value is for V3 for Ben.

 I've set up a Cox model to run like this:

 #!/usr/bin/Rscript
 library(bdsmatrix)
 library(kinship2)
 library(survival)
 library(coxme)
 death.dat - read.table(Test.cox,header=T)
 deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
 sink(Test.cox.R.Output)
 Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
 strata(factor(V2)) + factor(V3)) +
 (1|ID),data=death.dat,varlist=deathdat.kmat)
 Model
 sink()



 As you can see from the Test.cox file, I have a missing value *. How and
 where do I tell the R script treat * as a missing variable. If I can't
 incorporate missing values into the model, I assume the alternative is to
 remove all of the rows with missing data, which will greatly reduce my data
 set, as most rows have at least one missing variable.

 Thanks

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model -missing data.

2014-12-19 Thread Ted Harding
Hi Aoife,
I think that if you simply replace each * in the data file
with NA, then it should work (NA is usually interpreted
as missing for those functions for which missingness is
relevant). How you subsequently deal with records which have
missing values is another question (or many questions ... ).

So your data should look like:

V1   V2  V3   Survival   Event
ann  13  WTHomo   41
ben  20  NA   51
tom  40  Variant  61

Hoping this helps,
Ted.

On 19-Dec-2014 10:12:00 aoife doherty wrote:
 Hi all,
 
 I have a data set like this:
 
 Test.cox file:
 
 V1V2 V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  *51
 tom  40  Variant  61
 
 
 where * indicates that I don't know what the value is for V3 for Ben.
 
 I've set up a Cox model to run like this:
 
#!/usr/bin/Rscript
 library(bdsmatrix)
 library(kinship2)
 library(survival)
 library(coxme)
 death.dat - read.table(Test.cox,header=T)
 deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
 sink(Test.cox.R.Output)
 Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
 strata(factor(V2)) + factor(V3)) +
 (1|ID),data=death.dat,varlist=deathdat.kmat)
 Model
 sink()
 
 
 
 As you can see from the Test.cox file, I have a missing value *. How and
 where do I tell the R script treat * as a missing variable. If I can't
 incorporate missing values into the model, I assume the alternative is to
 remove all of the rows with missing data, which will greatly reduce my data
 set, as most rows have at least one missing variable.
 
 Thanks
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 19-Dec-2014  Time: 10:21:23
This message was sent by XFMail

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model -missing data.

2014-12-19 Thread aoife doherty
Many thanks, I appreciate the response.

When I convert the missing values to NA and run the cox model as described
in previous post,  the cox model seems to remove all of the rows with a
missing value (as the number of rows n in the cox output after I
completely remove any row with missing data is the same as the number of
rows n in the cox output after I change the missing values to NA).

What I had been hoping to do is not completely remove a row with missing
data for a co-variable, but rather somehow censor or estimate a value for
the missing value?

In reality, I have ~600 people with survival data and say 6 variables
attached to them. After I incorporate a 7th variable (for which the
information isn't available for every individual), I have 400 people left.
Since I still have survival data and almost all of the information for the
other 200 people (the only thing missing is information about that 7th
variable), it seems a waste to remove all of the survival data for 200
people over one co-variate. So I was hoping instead of completely removing
the rows, to just somehow acknowledge that the data for this particular
co-variate is missing in the model but not completely remove the row? This
is more what I was hoping someone would know if it's possible to
incorporate into the model I described above?

Thanks



On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net
wrote:

 Hi Aoife,
 I think that if you simply replace each * in the data file
 with NA, then it should work (NA is usually interpreted
 as missing for those functions for which missingness is
 relevant). How you subsequently deal with records which have
 missing values is another question (or many questions ... ).

 So your data should look like:

 V1   V2  V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  NA   51
 tom  40  Variant  61

 Hoping this helps,
 Ted.

 On 19-Dec-2014 10:12:00 aoife doherty wrote:
  Hi all,
 
  I have a data set like this:
 
  Test.cox file:
 
  V1V2 V3   Survival   Event
  ann  13  WTHomo   41
  ben  20  *51
  tom  40  Variant  61
 
 
  where * indicates that I don't know what the value is for V3 for Ben.
 
  I've set up a Cox model to run like this:
 
 #!/usr/bin/Rscript
  library(bdsmatrix)
  library(kinship2)
  library(survival)
  library(coxme)
  death.dat - read.table(Test.cox,header=T)
  deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
  sink(Test.cox.R.Output)
  Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
  strata(factor(V2)) + factor(V3)) +
  (1|ID),data=death.dat,varlist=deathdat.kmat)
  Model
  sink()
 
 
 
  As you can see from the Test.cox file, I have a missing value *. How
 and
  where do I tell the R script treat * as a missing variable. If I can't
  incorporate missing values into the model, I assume the alternative is to
  remove all of the rows with missing data, which will greatly reduce my
 data
  set, as most rows have at least one missing variable.
 
  Thanks
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 19-Dec-2014  Time: 10:21:23
 This message was sent by XFMail
 -


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model -missing data.

2014-12-19 Thread Ted Harding
Yes, your basic reasoning is correct. In general, the observed variables
carry information about the variables with missing values, so (in some
way) the missing values can be replaced with estimates (imputations)
and the standard regression method will then work as though the
replacements were there is the first place. To incorporate the inevitable
uncertainty about what the missing values really were, one approach
(multiple imputation) is to do the replacement many times over,
sampling the replacement values from a posterior distribution estimated
from the non-missing data. There are other approaches.

This is where the many questions kick in! I don't have time at the
moment, to go into further detail (there's a lot of it, and several
R packages which deal with missing data in different ways), but I hope
that someone can meanwhile point you in the right direction.

With best wishes,
Ted.

On 19-Dec-2014 11:17:27 aoife doherty wrote:
 Many thanks, I appreciate the response.
 
 When I convert the missing values to NA and run the cox model as described
 in previous post,  the cox model seems to remove all of the rows with a
 missing value (as the number of rows n in the cox output after I
 completely remove any row with missing data is the same as the number of
 rows n in the cox output after I change the missing values to NA).
 
 What I had been hoping to do is not completely remove a row with missing
 data for a co-variable, but rather somehow censor or estimate a value for
 the missing value?
 
 In reality, I have ~600 people with survival data and say 6 variables
 attached to them. After I incorporate a 7th variable (for which the
 information isn't available for every individual), I have 400 people left.
 Since I still have survival data and almost all of the information for the
 other 200 people (the only thing missing is information about that 7th
 variable), it seems a waste to remove all of the survival data for 200
 people over one co-variate. So I was hoping instead of completely removing
 the rows, to just somehow acknowledge that the data for this particular
 co-variate is missing in the model but not completely remove the row? This
 is more what I was hoping someone would know if it's possible to
 incorporate into the model I described above?
 
 Thanks
 
 
 
 On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net
 wrote:

 Hi Aoife,
 I think that if you simply replace each * in the data file
 with NA, then it should work (NA is usually interpreted
 as missing for those functions for which missingness is
 relevant). How you subsequently deal with records which have
 missing values is another question (or many questions ... ).

 So your data should look like:

 V1   V2  V3   Survival   Event
 ann  13  WTHomo   41
 ben  20  NA   51
 tom  40  Variant  61

 Hoping this helps,
 Ted.

 On 19-Dec-2014 10:12:00 aoife doherty wrote:
  Hi all,
 
  I have a data set like this:
 
  Test.cox file:
 
  V1V2 V3   Survival   Event
  ann  13  WTHomo   41
  ben  20  *51
  tom  40  Variant  61
 
 
  where * indicates that I don't know what the value is for V3 for Ben.
 
  I've set up a Cox model to run like this:
 
 #!/usr/bin/Rscript
  library(bdsmatrix)
  library(kinship2)
  library(survival)
  library(coxme)
  death.dat - read.table(Test.cox,header=T)
  deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
  sink(Test.cox.R.Output)
  Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
  strata(factor(V2)) + factor(V3)) +
  (1|ID),data=death.dat,varlist=deathdat.kmat)
  Model
  sink()
 
 
 
  As you can see from the Test.cox file, I have a missing value *. How
 and
  where do I tell the R script treat * as a missing variable. If I can't
  incorporate missing values into the model, I assume the alternative is to
  remove all of the rows with missing data, which will greatly reduce my
 data
  set, as most rows have at least one missing variable.
 
  Thanks
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 19-Dec-2014  Time: 10:21:23
 This message was sent by XFMail
 -

 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
 

Re: [R] Cox model -missing data.

2014-12-19 Thread Michael Dewey

Comment inline

On 19/12/2014 11:17, aoife doherty wrote:

Many thanks, I appreciate the response.

When I convert the missing values to NA and run the cox model as described
in previous post,  the cox model seems to remove all of the rows with a
missing value (as the number of rows n in the cox output after I
completely remove any row with missing data is the same as the number of
rows n in the cox output after I change the missing values to NA).

What I had been hoping to do is not completely remove a row with missing
data for a co-variable, but rather somehow censor or estimate a value for
the missing value?


I think you are searching for some form of imputation here. A full 
answer would be way beyond the scope of this list as it depends on so 
many things including the mechanism driving the missingness.


Have a look at
http://missingdata.lshtm.ac.uk/
and see whether that helps.



In reality, I have ~600 people with survival data and say 6 variables
attached to them. After I incorporate a 7th variable (for which the
information isn't available for every individual), I have 400 people left.
Since I still have survival data and almost all of the information for the
other 200 people (the only thing missing is information about that 7th
variable), it seems a waste to remove all of the survival data for 200
people over one co-variate. So I was hoping instead of completely removing
the rows, to just somehow acknowledge that the data for this particular
co-variate is missing in the model but not completely remove the row? This
is more what I was hoping someone would know if it's possible to
incorporate into the model I described above?

Thanks



On Fri, Dec 19, 2014 at 10:21 AM, Ted Harding ted.hard...@wlandres.net
wrote:


Hi Aoife,
I think that if you simply replace each * in the data file
with NA, then it should work (NA is usually interpreted
as missing for those functions for which missingness is
relevant). How you subsequently deal with records which have
missing values is another question (or many questions ... ).

So your data should look like:

V1   V2  V3   Survival   Event
ann  13  WTHomo   41
ben  20  NA   51
tom  40  Variant  61

Hoping this helps,
Ted.

On 19-Dec-2014 10:12:00 aoife doherty wrote:

Hi all,

I have a data set like this:

Test.cox file:

V1V2 V3   Survival   Event
ann  13  WTHomo   41
ben  20  *51
tom  40  Variant  61


where * indicates that I don't know what the value is for V3 for Ben.

I've set up a Cox model to run like this:

#!/usr/bin/Rscript
library(bdsmatrix)
library(kinship2)
library(survival)
library(coxme)
death.dat - read.table(Test.cox,header=T)
deathdat.kmat -2*with(death.dat,makekinship(famid,ID,faid,moid))
sink(Test.cox.R.Output)
Model - coxme(Surv(Survival,Event)~ strata(factor(V1)) +
strata(factor(V2)) + factor(V3)) +
(1|ID),data=death.dat,varlist=deathdat.kmat)
Model
sink()



As you can see from the Test.cox file, I have a missing value *. How

and

where do I tell the R script treat * as a missing variable. If I can't
incorporate missing values into the model, I assume the alternative is to
remove all of the rows with missing data, which will greatly reduce my

data

set, as most rows have at least one missing variable.

Thanks

   [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 19-Dec-2014  Time: 10:21:23
This message was sent by XFMail
-



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2015.0.5577 / Virus Database: 4253/8764 - Release Date: 12/19/14




--
Michael
http://www.dewey.myzen.co.uk

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model: random effect on a variable with 3 levels

2013-04-30 Thread lmajed
Question about package Coxme:
I develop a cox model that includes a variable treatment with 3 levels (A, B, 
C):

  model_alea_int - coxme(Surv(delai, status) ~ (1|trt)+ strata(center) , data)

I am surprised that the output given in R is 3 coefficients for random effects 
whereas only 2 dummy variables are created:
 contrasts(data$trt)
   B C   
A  0 0
B  1 0
C  0 1

  ranef(model_alea_int)
$trt
 A   B  C 
 0.24093054 -0.08332041 -0.15761013 

 I want to compare treatment B /treatment A and Treatment C /treatment A.
Is it possible to consider random effect for this variable treatment with 3 
levels?
what is the interpretation of the 3 coefficients?
Thank you.




--
View this message in context: 
http://r.789695.n4.nabble.com/Cox-model-random-effect-on-a-variable-with-3-levels-tp4665815.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model convergence

2013-02-18 Thread Terry Therneau



On 02/16/2013 05:00 AM, r-help-requ...@r-project.org wrote:

Then I perform cox regression as follows
m2_1-coxph(Surv(X_t0,X_t, vlsupp) ~ nvp  + as.factor(cd4pccat) +
as.factor(vlcat) + as.factor(agecat) + as.factor(whostage)  +
as.factor(hfacat) + as.factor(wfacat) + as.factor(wfhcat) +
as.factor(resistance) + as.factor(postrantb) +
cluster(id),data=myimp$imputations[[1]],method=breslow,robust=TRUE)
summary(m2_1)
The I get the following eWarning message:
In fitter(X, Y, strats, offset, init, control, weights = weights,  :
   Ran out of iterations and did not converge


Failure to converge is highly unusual for coxph unless the model is near singular.  A good 
rule for Cox models is to have 10-20 events for each coefficient.   When models get below 
2 events/coef the results can be unreliable, both numerically and biologically, and some 
version of a shrinkage model is called for.  What are the counts for your data set?


A vector of inital values, if supplied, needs to be of the same length as the 
coefficients.  Make it the same length and in the same order as the printed coefficients 
from your run that did not converge.


Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model approximaions (was comparing SAS and R survival....)

2012-04-02 Thread AO_Statistics
I have a question about Cox's partial likelihood approximations in coxph
function of survival package (and in SAS as well) in the presence of tied
events generated by grouping continuous event times into intervals.
I am processing estimations for recurrent events with time-dependent
covariates in the Andersen and Gill approach of Cox's model.

If I have understood Breslow's and Efron's approximations correctly, they
consist in modifying the denominators of the contributing likelihood term
when we do not know the order of occurrence of the events. This order is
important only if the tied events are associated to a diferent value of the
covariate.
I would like to know if the breslow and efron options still modify the
initial denominators of the terms when they correspond to the same
covariate.
Especially, whithin the same trajectory of the observed process (the same
individual), the covariate is measured once for each tied events.
To my mind, we would introduce a useless bias in this case since the initial
partial likelihood is true.

Thank you.

--
View this message in context: 
http://r.789695.n4.nabble.com/Re-Cox-model-approximaions-was-comparing-SAS-and-R-survival-tp3686543p4526443.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model approximaions (was comparing SAS and R survival....)

2011-07-24 Thread Göran Broström
On Fri, Jul 22, 2011 at 2:04 PM, Terry Therneau thern...@mayo.edu wrote:
  For time scale that are truly discrete Cox proposed the exact partial
 likelihood.

Or the method of partial likelihood applied to the discrete logistic model,

 I call that the exact method and SAS calls it the
 discrete method.  What we compute is precisely the same, however they
 use a clever algorithm which is faster.

Note that the model to estimate here is discrete. The base-line
conditional probabilities at each failure time are eliminated through
the partial likelihood argument. This can also be described as a
conditional logistic regression, where we condition on the total
number of failures in each risk set (thus eliminating the
risk-set-specific parameters). Suppose that in a risk set of size  n
there are  d  failures. This method must then consider all possible
ways of choosing  d  failures out of  n  at risk, or choose(n, d)
cases. This makes the computational burden huge with lots of ties.

The method ml in coxreg (package 'eha') uses a different approach.
Instead of conditional logistic regression it performs unconditional
logistic regression by adding one parameter per risk set. In principle
this is possible to do with 'glm' after expanding the data set with
toBinary in 'eha', but with large data sets and lots of risk sets,
glm chokes. Instead, with the ml approach in coxreg, the extra
parameters just introduced are eliminated by profiling them out! This
leads to a fast estimation procedure, compared to the abovementioned
'exact'  methods. A final note: with ml, the logistic regression
uses the cloglog link, to be compatible with the situation when data
really are continuous but grouped, and a proportional hazards model
holds.
(Interestingly, conditional inference is usually used to simplify
things; here it creates computational problems not present without
conditioning.)

  To make things even more
 confusing, Prentice introduced an exact marginal likelihood which is
 not implemented in R, but which SAS calls the exact method.

This is not so confusing if we realize that we now are in the
continuous time model. Then, with a risk set of size  n  with  d
failures, we must consider all possible permutations of the  d
failures, or  d!  cases. That is, here we assume that ties occur
because of imprecise measurement and that there is one true ordering.
This method calculates an average contribution to the partial
likelihood. (Btw, you refer to Prentice, but isn't this from the
Biometrika paper by Kalbfleisch  Prentice (1973)? And of course their
classical book?)

  Data is usually not truly discrete, however.  More often ties are the
 result of imprecise measurement or grouping.  The Efron approximation
 assumes that the data are actually continuous but we see ties because of
 this; it also introduces an approximation at one point in the
 calculation which greatly speeds up the computation; numerically the
 approximation is very good.

Note that both Breslow's and Efron's approximations are approximations
of the exact marginal likelihood.

  In spite of the irrational love that our profession has for anything
 branded with the word exact, I currently see no reason to ever use
 that particular computation in a Cox model.

Agreed; but only because it is so time consuming. The unconditional
logistic regression with profiling is a good alternative.

 I'm not quite ready to
 remove the option from coxph, but certainly am not going to devote any
 effort toward improving that part of the code.

  The Breslow approximation is less accurate, but is the easiest to
 program and therefore was the only method in early Cox model programs;
 it persists as the default in many software packages because of history.
 Truth be told, unless the number of tied deaths is quite large the
 difference in results between it and the Efron approx will be trivial.

  The worst approximation, and the one that can sometimes give seriously
 strange results, is to artificially remove ties from the data set by
 adding a random value to each subject's time.

Maybe, but randomly breaking ties may not be a bad idea; you could
regard that as getting an (unbiased?) estimator of the
exact (continuous-time) partial likelihood. Expanding: Instead of
going through all possible permutations, why not take a random sample
of size greater than one?

Göran

 Terry T


 --- begin quote --
 I didn't know precisely the specifities of each approximation method.
 I thus came back to section 3.3 of Therneau and Grambsch, Extending the
 Cox
 Model. I think I now see things more clearly. If I have understood
 correctly, both discrete option and exact functions assume true
 discrete event times in a model approximating the Cox model. Cox partial
 likelihood cannot be exactly maximized, or even written, when there are
 some
 ties, am I right ?

 In my sample, many of the ties (those whithin a single observation of
 the
 process) are due to the fact that continuous event times 

Re: [R] Cox model approximaions (was comparing SAS and R survival....)

2011-07-22 Thread Terry Therneau
 For time scale that are truly discrete Cox proposed the exact partial
likelihood.  I call that the exact method and SAS calls it the
discrete method.  What we compute is precisely the same, however they
use a clever algorithm which is faster.  To make things even more
confusing, Prentice introduced an exact marginal likelihood which is
not implemented in R, but which SAS calls the exact method.

  Data is usually not truly discrete, however.  More often ties are the
result of imprecise measurement or grouping.  The Efron approximation
assumes that the data are actually continuous but we see ties because of
this; it also introduces an approximation at one point in the
calculation which greatly speeds up the computation; numerically the
approximation is very good.  
  In spite of the irrational love that our profession has for anything
branded with the word exact, I currently see no reason to ever use
that particular computation in a Cox model.  I'm not quite ready to
remove the option from coxph, but certainly am not going to devote any
effort toward improving that part of the code.

  The Breslow approximation is less accurate, but is the easiest to
program and therefore was the only method in early Cox model programs;
it persists as the default in many software packages because of history.
Truth be told, unless the number of tied deaths is quite large the
difference in results between it and the Efron approx will be trivial.

  The worst approximation, and the one that can sometimes give seriously
strange results, is to artificially remove ties from the data set by
adding a random value to each subject's time.

Terry T


--- begin quote --
I didn't know precisely the specifities of each approximation method.
I thus came back to section 3.3 of Therneau and Grambsch, Extending the
Cox
Model. I think I now see things more clearly. If I have understood
correctly, both discrete option and exact functions assume true
discrete event times in a model approximating the Cox model. Cox partial
likelihood cannot be exactly maximized, or even written, when there are
some
ties, am I right ?

In my sample, many of the ties (those whithin a single observation of
the
process) are due to the fact that continuous event times are grouped
into
intervals.

So I think the logistic approximation may not be the best for my problem
despite the estimate on my real data set (shown on my previous post) do
give
interessant results regarding to the context of my data set !
I was thinking about distributing the events uniformly in each interval.
What do you think about this option ? Can I expect a better
approximation
than directly applying Breslow or Efron method directly with the grouped
event data ? Finally, it becomes a model problem more than a
computationnal
or algorithmic one I guess.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model approximaions (was comparing SAS and R survival....)

2011-07-22 Thread Mike Marchywka












 From: thern...@mayo.edu
 To: abouesl...@gmail.com
 Date: Fri, 22 Jul 2011 07:04:15 -0500
 CC: r-help@r-project.org
 Subject: Re: [R] Cox model approximaions (was comparing SAS and R 
 survival)

 For time scale that are truly discrete Cox proposed the exact partial
 likelihood. I call that the exact method and SAS calls it the
 discrete method. What we compute is precisely the same, however they
 use a clever algorithm which is faster. To make things even more
 confusing, Prentice introduced an exact marginal likelihood which is
 not implemented in R, but which SAS calls the exact method.

 Data is usually not truly discrete, however. More often ties are the
 result of imprecise measurement or grouping. The Efron approximation
 assumes that the data are actually continuous but we see ties because of
 this; it also introduces an approximation at one point in the
 calculation which greatly speeds up the computation; numerically the
 approximation is very good.
 In spite of the irrational love that our profession has for anything
 branded with the word exact, I currently see no reason to ever use
 that particular computation in a Cox model. I'm not quite ready to
 remove the option from coxph, but certainly am not going to devote any
 effort toward improving that part of the code.

 The Breslow approximation is less accurate, but is the easiest to
 program and therefore was the only method in early Cox model programs;
 it persists as the default in many software packages because of history.
 Truth be told, unless the number of tied deaths is quite large the
 difference in results between it and the Efron approx will be trivial.

 The worst approximation, and the one that can sometimes give seriously
 strange results, is to artificially remove ties from the data set by
 adding a random value to each subject's time.

Care to elaborate on this at all? First of course I would agree that doing
anything to the data, or making up data, and then handing it to an analysis 
tool that doesn't
know you maniplated it can be a problem ( often called interpolation or 
something 
with a legitimate name LOL).  However, it is not unreasonable to do a 
sensitivity
analysis by adding noise and checking the results.  Presumaably adding 
noise to remove things the algorighm doesn't happen to like would
work but you would need to take many samples and examine stats
of how your broke the ties. Now if the model is bad to begin with or
the data is so coarsely binned that you can't get much out of it then ok.

I guess in this case, having not thought about it too much, ties would
be most common either with lots of data or if hazards spiked over time scales 
simlar to your
measurement precision or if the measurement resolution is not comparable to 
hazard
rate. In the latter 2 cases of course the approach is probably quite  limited . 
Consider turning
exponential curves into step functions for example. 


 Terry T


 --- begin quote --
 I didn't know precisely the specifities of each approximation method.
 I thus came back to section 3.3 of Therneau and Grambsch, Extending the
 Cox
 Model. I think I now see things more clearly. If I have understood
 correctly, both discrete option and exact functions assume true
 discrete event times in a model approximating the Cox model. Cox partial
 likelihood cannot be exactly maximized, or even written, when there are
 some
 ties, am I right ?

 In my sample, many of the ties (those whithin a single observation of
 the
 process) are due to the fact that continuous event times are grouped
 into
 intervals.

 So I think the logistic approximation may not be the best for my problem
 despite the estimate on my real data set (shown on my previous post) do
 give
[[elided Hotmail spam]]
 I was thinking about distributing the events uniformly in each interval.
 What do you think about this option ? Can I expect a better
 approximation
 than directly applying Breslow or Efron method directly with the grouped
 event data ? Finally, it becomes a model problem more than a
 computationnal
 or algorithmic one I guess.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model, model averaging and survival curve

2011-03-16 Thread Martin Patenaude-Monette
Thanks a lot for your answer.


Martin Patenaude-Monette

MSc. Candidate
Département de biologie
Université du Québec à Montréal

2011/3/15 Terry Therneau thern...@mayo.edu

 --- included text --
 I have done model selection between candidate Cox models, using AICc
 calculated with penalized log likelihoods. Then model averaging was done
 to
 obtain model averaged parameter estimates. Is there a way to plot
 survival
 curve from the averaged model, by estimating baseline hazard and
 baseline survival?

 -- end inclusion ---

  You can fit a Cox model with fixed coefficients.  Assume fixbeta are
 the coefficients from your model averaging, then do

  ffit - coxph(Surv(time, status) ~ x1 + x2 +   , data=mydata,
init=fixbeta, iter=0)
  sfit - survfit(fit)

 The standard errors in sfit are incorrect of course.  One could
 bootstrap the entire model creation process to get accurate values.

 Terry Therneau




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model, model averaging and survival curve

2011-03-15 Thread Terry Therneau
--- included text --
I have done model selection between candidate Cox models, using AICc
calculated with penalized log likelihoods. Then model averaging was done
to
obtain model averaged parameter estimates. Is there a way to plot
survival
curve from the averaged model, by estimating baseline hazard and
baseline survival?

-- end inclusion ---

 You can fit a Cox model with fixed coefficients.  Assume fixbeta are
the coefficients from your model averaging, then do

  ffit - coxph(Surv(time, status) ~ x1 + x2 +   , data=mydata,
init=fixbeta, iter=0)
  sfit - survfit(fit)

The standard errors in sfit are incorrect of course.  One could
bootstrap the entire model creation process to get accurate values.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model, model averaging and survival curve

2011-03-14 Thread Martin Patenaude-Monette
Dear community,

I have done model selection between candidate Cox models, using AICc
calculated with penalized log likelihoods. Then model averaging was done to
obtain model averaged parameter estimates. Is there a way to plot survival
curve from the averaged model, by estimating baseline hazard and baseline
survival?

Thanks in advance,
Martin Patenaude-Monette

MSc. Candidate
Département de biologie
Université du Québec à Montréal

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model output hazard ratios

2010-11-17 Thread Julien Vezilier
Dear R users,

Here is the coxme output I obtain on my survival dataset having 3 strains
and 2 infection status (i: infected, ni: non infected)

coxme(Surv(lay) ~ infection*strain, data=datalay, random= ~1 |block)

Cox mixed-effects model fit by maximum likelihood

Data: datalay
 n= 1194
 Iterations= 3 77
NULL Integrated Penalized
Log-likelihood -7270.028  -7223.859 -7218.175

 Penalized loglik: chisq= 103.71 on 10.24 degrees of freedom, p= 0
 Integrated loglik: chisq= 92.34 on 6 degrees of freedom, p= 0

Fixed effects: Surv(lay) ~ infection * strain


coef  exp(coef)  se(coef)z
p

(line1) infection_ni   -0.6482079 0.5229822
0.1481232-4.381.2e-05
(line2) strainB  -0.8797408
0.4148904  0.1091490-8.06   7.8e-16
(line3) strainC -0.7378955
0.4781191  0.1045977-7.05   1.7e-12
(line4 )infection_ni:strainB0.7631418
2.1450049  0.14624815.221.8e-07
(line5 )infection_ni:strainC0.5358302
1.7088664   0.1431092   3.74   1.8e-04

Random effects: ~1 | block

block
Variance: 0.02493629

I'm not sure I understand properly what are the hazard ratios in the *exp(coef)
column* for the two interaction terms. In the R book of Mick Crawley is
explained that you should be able to calculate these figures approximatively
using the mean survival as below:

tapply(datalay$lay,datalay$strain:datalay$infection,mean)

A:i  A:ni  B:i B:ni
C:iC:ni
1.864865   2.8883723.586826  3.253394   3.2962963.52073

For example, the exp(coef) value in (line 1) of my model output
representsthe ratio between the mean survival of genotype A infected
(i), with mean
survival of genotype A (ni):1.86/2.88 =  0.64, not too far from what the
model gives.

Following the same reasoning, (line 2) is ratio between B infected and A
infected, and (line 3) the ratio between C (i) and A (i).

My question is very simple: *what are the hazard ratios for the interaction
terms stated (line 4) and (line 5), and how are they calculated ?*

Thanks a lot in advance,
Julien.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model+ROCR

2008-03-04 Thread Terry Therneau
 I am trying to build a cox model and then perform ROC analysis in order to
 retrieve some genes that are correlated with breast cancer. When I calculate
 ...
 
 Extension of ROC values to the censored data case is handled by the rcorr.cens
 function found in the Hmisc library.  See the references and examples there.

In short, one interpretation of the ROC statistic is in terms of 
concordance.  Consider all pairs of comparable subjects.  For a yes/no 
outcome, this is all pairs where one of the subjects is a yes and the other a 
no.  The c-statistic = % of all pairs where the model's score correctly ordered 
them (higher score for the yes).  For a yes/no outcome this = the ROC.  For 
your 
endpoint it is just somewhat harder to count up comparables: they are all 
pairs for which I can determine which relapsed first.  Relapse at 20 vs censor 
at 15, for instance, is not comparable.

Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model+ROCR

2008-03-03 Thread Eleni Christodoulou
Dear list,

I am trying to build a cox model and then perform ROC analysis in order to
retrieve some genes that are correlated with breast cancer. When I calculate
the hazard score taking into account different numbers of genes and their
coefficients ( I am trying to find the pest predictor number of genes), I
retrieve from around 1 values (for few genes included ) to size of e+80
values (for many genes included).
I am using the prediction method from the ROCR package which takes as
arguments the calculated scores and the true class scores. I really don't
know what to compare my values with, because the only data that I have
available are the time to relapse or last follow-up (months) and the relapse
score (1=TRUE, 0=FALSE) of the patients. I have never performed ROC analysis
before and I am a bit lost...
Any help with this is  really very welcome!

Thank you all,
Eleni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-13 Thread Terry Therneau
 What you appear to want are all of the univariate models.  You can get this 
with a loop (and patience - it won't be fast).
 
ngene - ncol(genes)
coefmat - matrix(0., nrow=ngene, ncol=2)
for (i in 1:ngene) {
tempfit - coxph(Surv(time, relapse) ~ genes[,i])
coefmat[i,] - c(tempfit$coef, sqrt(tempfit$var))
}


  However, the fact that R can do this for you does not mean it is a good idea. 
 
In fact, doing all of the univariate tests for a microarray has been shown by 
many people to be a very bad idea.  There are several approaches to deal with 
the key issues, which you should research before going forward.
  
  Terry Therneau

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-13 Thread Eleni Christodoulou
Hmm...I see. I think I will give a try to the univariate analysis
nonetheless...I intend to catch the p-values for each gene and select the
most significant from these...I have seen it in several papers.

Best Regards,
Eleni

On Feb 13, 2008 2:59 PM, Terry Therneau [EMAIL PROTECTED] wrote:

  What you appear to want are all of the univariate models.  You can get
 this
 with a loop (and patience - it won't be fast).

 ngene - ncol(genes)
 coefmat - matrix(0., nrow=ngene, ncol=2)
 for (i in 1:ngene) {
tempfit - coxph(Surv(time, relapse) ~ genes[,i])
coefmat[i,] - c(tempfit$coef, sqrt(tempfit$var))
}


  However, the fact that R can do this for you does not mean it is a good
 idea.
 In fact, doing all of the univariate tests for a microarray has been shown
 by
 many people to be a very bad idea.  There are several approaches to deal
 with
 the key issues, which you should research before going forward.

  Terry Therneau



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-13 Thread Matthias Gondan
Hi Eleni,

The problem of this approach is easily explained: Under the Null 
hypothesis, the P values
of a significance test are random variables, uniformly distributed in 
the interval [0, 1]. It
is easily seen that the lowest of these P values is not any 'better' 
than the highest of the
P values.

Best wishes,

Matthias

Eleni Christodoulou schrieb:
 Hmm...I see. I think I will give a try to the univariate analysis
 nonetheless...I intend to catch the p-values for each gene and select the
 most significant from these...I have seen it in several papers.

 Best Regards,
 Eleni

 On Feb 13, 2008 2:59 PM, Terry Therneau [EMAIL PROTECTED] wrote:

   
  What you appear to want are all of the univariate models.  You can get
 this
 with a loop (and patience - it won't be fast).

 ngene - ncol(genes)
 coefmat - matrix(0., nrow=ngene, ncol=2)
 for (i in 1:ngene) {
tempfit - coxph(Surv(time, relapse) ~ genes[,i])
coefmat[i,] - c(tempfit$coef, sqrt(tempfit$var))
}


  However, the fact that R can do this for you does not mean it is a good
 idea.
 In fact, doing all of the univariate tests for a microarray has been shown
 by
 many people to be a very bad idea.  There are several approaches to deal
 with
 the key issues, which you should research before going forward.

  Terry Therneau


 

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-13 Thread Gustaf Rydevik
On Feb 13, 2008 2:37 PM, Matthias Gondan [EMAIL PROTECTED] wrote:
 Hi Eleni,

 The problem of this approach is easily explained: Under the Null
 hypothesis, the P values
 of a significance test are random variables, uniformly distributed in
 the interval [0, 1]. It
 is easily seen that the lowest of these P values is not any 'better'
 than the highest of the
 P values.

 Best wishes,

 Matthias


Correct me if I'm wrong, but isn't that the point? I assume that the
hypothesis is that one or more of these genes are true predictors,
i.e. for these genes the p-value should be significant. For all the
other genes, the p-value is uniformly distributed. Using a
significance level of 0.01, and an a priori knowledge that there are
significant genes, you will end up with on the order of 20 genes, some
of which are the true predictors, and the rest being false
positives. this set of 20 genes can then be further analysed. A much
smaller and easier problem to solve, no?


/Gustaf
-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-13 Thread Gustaf Rydevik
On Feb 13, 2008 3:06 PM, Gustaf Rydevik [EMAIL PROTECTED] wrote:
 On Feb 13, 2008 2:37 PM, Matthias Gondan [EMAIL PROTECTED] wrote:
  Hi Eleni,
 
  The problem of this approach is easily explained: Under the Null
  hypothesis, the P values
  of a significance test are random variables, uniformly distributed in
  the interval [0, 1]. It
  is easily seen that the lowest of these P values is not any 'better'
  than the highest of the
  P values.
 
  Best wishes,
 
  Matthias
 

 Correct me if I'm wrong, but isn't that the point? I assume that the
 hypothesis is that one or more of these genes are true predictors,
 i.e. for these genes the p-value should be significant. For all the
 other genes, the p-value is uniformly distributed. Using a
 significance level of 0.01, and an a priori knowledge that there are
 significant genes, you will end up with on the order of 20 genes, some
 of which are the true predictors, and the rest being false
 positives. this set of 20 genes can then be further analysed. A much
 smaller and easier problem to solve, no?


 /Gustaf

Sorry, it should say 200 genes instead of 20.

-- 
Gustaf Rydevik, M.Sci.
tel: +46(0)703 051 451
address:Essingetorget 40,112 66 Stockholm, SE
skype:gustaf_rydevik

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-13 Thread Duncan Murdoch
On 2/13/2008 9:08 AM, Gustaf Rydevik wrote:
 On Feb 13, 2008 3:06 PM, Gustaf Rydevik [EMAIL PROTECTED] wrote:
 On Feb 13, 2008 2:37 PM, Matthias Gondan [EMAIL PROTECTED] wrote:
  Hi Eleni,
 
  The problem of this approach is easily explained: Under the Null
  hypothesis, the P values
  of a significance test are random variables, uniformly distributed in
  the interval [0, 1]. It
  is easily seen that the lowest of these P values is not any 'better'
  than the highest of the
  P values.
 
  Best wishes,
 
  Matthias
 

 Correct me if I'm wrong, but isn't that the point? I assume that the
 hypothesis is that one or more of these genes are true predictors,
 i.e. for these genes the p-value should be significant. For all the
 other genes, the p-value is uniformly distributed. Using a
 significance level of 0.01, and an a priori knowledge that there are
 significant genes, you will end up with on the order of 20 genes, some
 of which are the true predictors, and the rest being false
 positives. this set of 20 genes can then be further analysed. A much
 smaller and easier problem to solve, no?


 /Gustaf
 
 Sorry, it should say 200 genes instead of 20.
 

I agree with your general point, but want to make one small quibble: 
the choice of 0.01 as a cutoff depends pretty strongly on the 
distribution of the p-value under the alternative.  With a small sample 
size and/or a small effect size, that may miss the majority of the true 
predictors.  You may need it to be 0.1 or higher to catch most of them, 
and then you'll have 10 times as many false positives to wade through 
(but still 10 times fewer than you started with, so your main point 
still holds).

Duncan Murdoch

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-12 Thread darteta001
Dear Eleni,

from a previous post regarding maximum number of variables in a 
multiple linear regression analysis, posted last tuesday, and I think 
it can be relevant also to Cox PH models:

I can think of
no circumstance where multiple regression on hundreds of thousands of
variables is anything more than a fancy random number generator


The thread is continued by someone having your same problem:


When I try a regression problem with 
3,000 coefficients in R running under Windows XP 64 bit with 8Gb of 
memory 
on the machine and the /3Gb option active (i.e., R can get up to 3Gb), 
R 
2.6.1 runs out of memory (apparently trying to duplicate the model 
matrix)


but the author continues...

...one must be careful doing ordinary linear 
regression with large numbers of coefficients.  It does seem a little 
unlikely that there is sufficient data to get useful estimates of 
three 
thousand coefficients using linear regression

I also work with genomic data and it seems a well-accepted rule to 
filter data. I am sure not all of your 18000 genes are relevant to 
your study or have an effect on survival. Have a look at BioConductor 
mailing list for info on this topic.

Best
David

 Hi David,
 
 The problem is that I need all these regressors. I need a 
coefficient for
 every one of them and then rank them according to that coefficient.
 
 Thanks,
 Eleni
 
 On Feb 12, 2008 4:54 PM, [EMAIL PROTECTED] wrote:
 
  Hi Eleni,
 
  I am not an expert in R or statistics but in my opinion you have 
too
  many regressors compared to the number of observations and that 
might
  be the reason why you get the error. Others might say better but as
  far as I know, having only 80 observations, it is a good idea to 
first
  filter your list of variables down to a few tenths.
 
 
  HTH
 
  David
 
   Hello R-community,
  
   It's been a week now that I am struggling with the 
implementation of
  a cox
   model in R. I have 80 cancer patients, so 80 time measurements 
and 80
   relapse or no measurements (respective to censor, 1 if relapsed 
over
  the
   examined period, 0 if not). My microarray data contain around 
18000
  genes.
   So I have the expressions of 18000 genes in each of the 80 tumors
  (matrix
   80*18000). I would like to build a cox model in order to retrieve
  the most
   significant genes (according to the p-value). The command that I 
am
  using
   is:
  
   test1 - list(time,relapse,genes)
   coxph( Surv(time, relapse) ~ genes, test1)
  
   where time is a vector of size 80 containing the times, relapse 
is a
  vector
   of size 80 containing the relapse values and genes is a matrix
  80*18000.
   When I give the coxph command I retrieve an error saying that 
cannot
   allocate vector of size 2.7Mb  (in Windows). I also tried linux 
and
  then I
   receive error that maximum memory is reached. I increase the 
memory
  by
   initializing R with the command:
   R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-
nsize=200M
  
   I think it cannot get better than that because if I try for 
example
   max-vsize=300 the memomry capacity is stored as NA.
  
   Does anyone have any idea why this happens and how I can 
overcome it?
  
   I would be really grateful if you could help!
   It has been bothering me a lot!
  
   Thank you all,
   Eleni
  
 [[alternative HTML version deleted]]
  
   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide http://www.R-
project.org/posting-
  guide.html
   and provide commented, minimal, self-contained, reproducible 
code.
  
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-12 Thread Eleni Christodoulou
Hi David,

The problem is that I need all these regressors. I need a coefficient for
every one of them and then rank them according to that coefficient.

Thanks,
Eleni

On Feb 12, 2008 4:54 PM, [EMAIL PROTECTED] wrote:

 Hi Eleni,

 I am not an expert in R or statistics but in my opinion you have too
 many regressors compared to the number of observations and that might
 be the reason why you get the error. Others might say better but as
 far as I know, having only 80 observations, it is a good idea to first
 filter your list of variables down to a few tenths.


 HTH

 David

  Hello R-community,
 
  It's been a week now that I am struggling with the implementation of
 a cox
  model in R. I have 80 cancer patients, so 80 time measurements and 80
  relapse or no measurements (respective to censor, 1 if relapsed over
 the
  examined period, 0 if not). My microarray data contain around 18000
 genes.
  So I have the expressions of 18000 genes in each of the 80 tumors
 (matrix
  80*18000). I would like to build a cox model in order to retrieve
 the most
  significant genes (according to the p-value). The command that I am
 using
  is:
 
  test1 - list(time,relapse,genes)
  coxph( Surv(time, relapse) ~ genes, test1)
 
  where time is a vector of size 80 containing the times, relapse is a
 vector
  of size 80 containing the relapse values and genes is a matrix
 80*18000.
  When I give the coxph command I retrieve an error saying that cannot
  allocate vector of size 2.7Mb  (in Windows). I also tried linux and
 then I
  receive error that maximum memory is reached. I increase the memory
 by
  initializing R with the command:
  R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M
 
  I think it cannot get better than that because if I try for example
  max-vsize=300 the memomry capacity is stored as NA.
 
  Does anyone have any idea why this happens and how I can overcome it?
 
  I would be really grateful if you could help!
  It has been bothering me a lot!
 
  Thank you all,
  Eleni
 
[[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox model

2008-02-12 Thread darteta001
Hi Eleni,

I am not an expert in R or statistics but in my opinion you have too 
many regressors compared to the number of observations and that might 
be the reason why you get the error. Others might say better but as 
far as I know, having only 80 observations, it is a good idea to first 
filter your list of variables down to a few tenths.


HTH

David

 Hello R-community,
 
 It's been a week now that I am struggling with the implementation of 
a cox
 model in R. I have 80 cancer patients, so 80 time measurements and 80
 relapse or no measurements (respective to censor, 1 if relapsed over 
the
 examined period, 0 if not). My microarray data contain around 18000 
genes.
 So I have the expressions of 18000 genes in each of the 80 tumors 
(matrix
 80*18000). I would like to build a cox model in order to retrieve 
the most
 significant genes (according to the p-value). The command that I am 
using
 is:
 
 test1 - list(time,relapse,genes)
 coxph( Surv(time, relapse) ~ genes, test1)
 
 where time is a vector of size 80 containing the times, relapse is a 
vector
 of size 80 containing the relapse values and genes is a matrix 
80*18000.
 When I give the coxph command I retrieve an error saying that cannot
 allocate vector of size 2.7Mb  (in Windows). I also tried linux and 
then I
 receive error that maximum memory is reached. I increase the memory 
by
 initializing R with the command:
 R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M
 
 I think it cannot get better than that because if I try for example
 max-vsize=300 the memomry capacity is stored as NA.
 
 Does anyone have any idea why this happens and how I can overcome it?
 
 I would be really grateful if you could help!
 It has been bothering me a lot!
 
 Thank you all,
 Eleni
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Cox model

2008-02-12 Thread Eleni Christodoulou
Hello R-community,

It's been a week now that I am struggling with the implementation of a cox
model in R. I have 80 cancer patients, so 80 time measurements and 80
relapse or no measurements (respective to censor, 1 if relapsed over the
examined period, 0 if not). My microarray data contain around 18000 genes.
So I have the expressions of 18000 genes in each of the 80 tumors (matrix
80*18000). I would like to build a cox model in order to retrieve the most
significant genes (according to the p-value). The command that I am using
is:

test1 - list(time,relapse,genes)
coxph( Surv(time, relapse) ~ genes, test1)

where time is a vector of size 80 containing the times, relapse is a vector
of size 80 containing the relapse values and genes is a matrix 80*18000.
When I give the coxph command I retrieve an error saying that cannot
allocate vector of size 2.7Mb  (in Windows). I also tried linux and then I
receive error that maximum memory is reached. I increase the memory by
initializing R with the command:
R --min-vsize=10M --max-vsize=250M --min-nsize=1M --max-nsize=200M

I think it cannot get better than that because if I try for example
max-vsize=300 the memomry capacity is stored as NA.

Does anyone have any idea why this happens and how I can overcome it?

I would be really grateful if you could help!
It has been bothering me a lot!

Thank you all,
Eleni

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.