[R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
Dear all,

I'm struggling with predicting expected time until death for a coxph and
survreg model.

I have two datasets. Dataset 1 includes a certain number of people for which
I know a vector of covariates (age, gender, etc.) and their event times
(i.e., I know whether they have died and when if death occurred prior to the
end of the observation period). Dataset 2 includes another set of people for
which I only have the covariate vector. I would like to use Dataset 1 to
calibrate either a coxph or survreg model and then use this model to
determine an expected time until death for the individuals in Dataset 2.
For example, I would like to know when a person in Dataset 2 will die, given
his/ her age and gender.

I checked predict.coxph and predict.survreg as well as the document A
Package for Survival Analysis in S written by Terry M. Therneau but I have
to admit that I'm a bit lost here.

Could anyone give me some advice on how this could be done?

Thanks very much in advance,

Michael



Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:


Dear all,

I'm struggling with predicting expected time until death for a  
coxph and

survreg model.

I have two datasets. Dataset 1 includes a certain number of people  
for which
I know a vector of covariates (age, gender, etc.) and their event  
times
(i.e., I know whether they have died and when if death occurred  
prior to the
end of the observation period). Dataset 2 includes another set of  
people for
which I only have the covariate vector. I would like to use Dataset  
1 to

calibrate either a coxph or survreg model and then use this model to
determine an expected time until death for the individuals in  
Dataset 2.
For example, I would like to know when a person in Dataset 2 will  
die, given

his/ her age and gender.

I checked predict.coxph and predict.survreg as well as the document A
Package for Survival Analysis in S written by Terry M. Therneau but  
I have

to admit that I'm a bit lost here.


The first step would be creating a Surv-object, followed by running a  
regression that created a coxph-object,  using dataset1 as input. So  
you should be looking at:


?Surv
?coxph

There are worked examples in the help pages. You would then run  
predict() on the coxph fit with dataset2 as the newdata argument.  
The default output is the linear predictor for the log-hazard relative  
to a mean survival estimate but other sorts of estimates are possible.  
The survfit function provides survival curve suitable for plotting.


(You may want to inquire at a local medical school to find  
statisticians who have experience with this approach. This is ordinary  
biostatistics these days.)


--
David.



Could anyone give me some advice on how this could be done?

Thanks very much in advance,

Michael



Michael Haenlein
Professor of Marketing
ESCP Europe
Paris, France


David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Mattia Prosperi
Indeed, from the predict() function of the coxph you cannot get
directly time predictions, but only linear and exponential risk
scores. This is because, in order to get the time, a baseline hazard
has to be computed and it is not straightforward since it is implicit
in the Cox model.

2010/11/11 David Winsemius dwinsem...@comcast.net:

 On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:

 Dear all,

 I'm struggling with predicting expected time until death for a coxph and
 survreg model.

 I have two datasets. Dataset 1 includes a certain number of people for
 which
 I know a vector of covariates (age, gender, etc.) and their event times
 (i.e., I know whether they have died and when if death occurred prior to
 the
 end of the observation period). Dataset 2 includes another set of people
 for
 which I only have the covariate vector. I would like to use Dataset 1 to
 calibrate either a coxph or survreg model and then use this model to
 determine an expected time until death for the individuals in Dataset 2.
 For example, I would like to know when a person in Dataset 2 will die,
 given
 his/ her age and gender.

 I checked predict.coxph and predict.survreg as well as the document A
 Package for Survival Analysis in S written by Terry M. Therneau but I
 have
 to admit that I'm a bit lost here.

 The first step would be creating a Surv-object, followed by running a
 regression that created a coxph-object,  using dataset1 as input. So you
 should be looking at:

 ?Surv
 ?coxph

 There are worked examples in the help pages. You would then run predict() on
 the coxph fit with dataset2 as the newdata argument. The default output is
 the linear predictor for the log-hazard relative to a mean survival estimate
 but other sorts of estimates are possible. The survfit function provides
 survival curve suitable for plotting.

 (You may want to inquire at a local medical school to find statisticians who
 have experience with this approach. This is ordinary biostatistics these
 days.)

 --
 David.


 Could anyone give me some advice on how this could be done?

 Thanks very much in advance,

 Michael



 Michael Haenlein
 Professor of Marketing
 ESCP Europe
 Paris, France

 David Winsemius, MD
 West Hartford, CT

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
Thanks very much for your answers, David and Mattia.

I understand that the baseline hazard in a Cox model is unknown and that
this makes the calculation of expected survival difficult.
Does this change when I move to a survreg model instead?

I think I'm OK with estimating a Cox model (or a survreg model) as I've done
so in the past.
But I'm lost with the different options in the prediction part (e.g.,
linear, quantile, risk, expected, ...).
Is there any document that can provide an explanation what these options
mean?

Sorry in case these questions are naive ... hope they're not too stupd ;-)


On Thu, Nov 11, 2010 at 5:03 PM, Mattia Prosperi ahn...@gmail.com wrote:

 Indeed, from the predict() function of the coxph you cannot get
 directly time predictions, but only linear and exponential risk
 scores. This is because, in order to get the time, a baseline hazard
 has to be computed and it is not straightforward since it is implicit
 in the Cox model.

 2010/11/11 David Winsemius dwinsem...@comcast.net:
 
  On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote:
 
  Dear all,
 
  I'm struggling with predicting expected time until death for a coxph
 and
  survreg model.
 
  I have two datasets. Dataset 1 includes a certain number of people for
  which
  I know a vector of covariates (age, gender, etc.) and their event times
  (i.e., I know whether they have died and when if death occurred prior to
  the
  end of the observation period). Dataset 2 includes another set of people
  for
  which I only have the covariate vector. I would like to use Dataset 1 to
  calibrate either a coxph or survreg model and then use this model to
  determine an expected time until death for the individuals in Dataset
 2.
  For example, I would like to know when a person in Dataset 2 will die,
  given
  his/ her age and gender.
 
  I checked predict.coxph and predict.survreg as well as the document A
  Package for Survival Analysis in S written by Terry M. Therneau but I
  have
  to admit that I'm a bit lost here.
 
  The first step would be creating a Surv-object, followed by running a
  regression that created a coxph-object,  using dataset1 as input. So you
  should be looking at:
 
  ?Surv
  ?coxph
 
  There are worked examples in the help pages. You would then run predict()
 on
  the coxph fit with dataset2 as the newdata argument. The default output
 is
  the linear predictor for the log-hazard relative to a mean survival
 estimate
  but other sorts of estimates are possible. The survfit function provides
  survival curve suitable for plotting.
 
  (You may want to inquire at a local medical school to find statisticians
 who
  have experience with this approach. This is ordinary biostatistics these
  days.)
 
  --
  David.
 
 
  Could anyone give me some advice on how this could be done?
 
  Thanks very much in advance,
 
  Michael
 
 
 
  Michael Haenlein
  Professor of Marketing
  ESCP Europe
  Paris, France
 
  David Winsemius, MD
  West Hartford, CT
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread James C. Whanger
Michael,

You are looking to compute an estimated time to death -- rather than the
odds of death conditional upon time.  Thus, you will want to use time to
death as your dependent variable rather than a dichotomous outcome (
0=alive, 1=death).   You can accomplish this with a straight forward
regression analysis.

Best,

Jim

On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein haenl...@escpeurope.euwrote:

 Dear all,

 I'm struggling with predicting expected time until death for a coxph and
 survreg model.

 I have two datasets. Dataset 1 includes a certain number of people for
 which
 I know a vector of covariates (age, gender, etc.) and their event times
 (i.e., I know whether they have died and when if death occurred prior to
 the
 end of the observation period). Dataset 2 includes another set of people
 for
 which I only have the covariate vector. I would like to use Dataset 1 to
 calibrate either a coxph or survreg model and then use this model to
 determine an expected time until death for the individuals in Dataset 2.
 For example, I would like to know when a person in Dataset 2 will die,
 given
 his/ her age and gender.

 I checked predict.coxph and predict.survreg as well as the document A
 Package for Survival Analysis in S written by Terry M. Therneau but I have
 to admit that I'm a bit lost here.

 Could anyone give me some advice on how this could be done?

 Thanks very much in advance,

 Michael



 Michael Haenlein
 Professor of Marketing
 ESCP Europe
 Paris, France

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
*James C. Whanger
Research Consultant
2 Wolf Ridge Gap
Ledyard, CT  06339

Phone: 860.389.0414*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
Thanks for the comment, James!

The problem is that my initial sample (Dataset 1) is truncated. That means I
only observe time to death for those individuals who actually died before
end of my observation period. It is my understanding that this type of
truncation creates a bias when I use a normal regression analysis. Hence
my idea to use some form of survival model.

I had another look at predict.survreg and I think the option response
could work for me.
When I run the following code I get ptime = 290.3648.
I assume this means that an individual with ph.ecog=2 can be expected to
life another 290.3648 days before death occurs [days is the time scale of
the time variable).
Could someone confirm whether this makes sense?

lfit - survreg(Surv(time, status) ~ ph.ecog, data=lung)
ptime - predict(lfit, newdata=data.frame(ph.ecog=2), type='response')



On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
james.whan...@gmail.comwrote:

 Michael,

 You are looking to compute an estimated time to death -- rather than the
 odds of death conditional upon time.  Thus, you will want to use time to
 death as your dependent variable rather than a dichotomous outcome (
 0=alive, 1=death).   You can accomplish this with a straight forward
 regression analysis.

 Best,

 Jim

 On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein 
 haenl...@escpeurope.euwrote:

 Dear all,

 I'm struggling with predicting expected time until death for a coxph and
 survreg model.

 I have two datasets. Dataset 1 includes a certain number of people for
 which
 I know a vector of covariates (age, gender, etc.) and their event times
 (i.e., I know whether they have died and when if death occurred prior to
 the
 end of the observation period). Dataset 2 includes another set of people
 for
 which I only have the covariate vector. I would like to use Dataset 1 to
 calibrate either a coxph or survreg model and then use this model to
 determine an expected time until death for the individuals in Dataset 2.
 For example, I would like to know when a person in Dataset 2 will die,
 given
 his/ her age and gender.

 I checked predict.coxph and predict.survreg as well as the document A
 Package for Survival Analysis in S written by Terry M. Therneau but I
 have
 to admit that I'm a bit lost here.

 Could anyone give me some advice on how this could be done?

 Thanks very much in advance,

 Michael



 Michael Haenlein
 Professor of Marketing
 ESCP Europe
 Paris, France

[[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 --
 *James C. Whanger
 Research Consultant
 2 Wolf Ridge Gap
 Ledyard, CT  06339

 Phone: 860.389.0414*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread David Winsemius


On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote:


Thanks for the comment, James!

The problem is that my initial sample (Dataset 1) is truncated. That  
means I
only observe time to death for those individuals who actually died  
before

end of my observation period. It is my understanding that this type of
truncation creates a bias when I use a normal regression analysis.  
Hence

my idea to use some form of survival model.

I had another look at predict.survreg and I think the option  
response

could work for me.
When I run the following code I get ptime = 290.3648.
I assume this means that an individual with ph.ecog=2 can be  
expected to
life another 290.3648 days before death occurs [days is the time  
scale of

the time variable).


It is a prediction under specific assumptions underpinning a  
parametric estimate.



Could someone confirm whether this makes sense?


You ought to confirm that it makes sense by comparing to your data:
reauire(Hmisc); require(survival)
your code

 describe(lung[lung$status==1lung$ph.ecog==2,time])
lung[lung$status == 1  lung$ph.ecog == 2, time]
  n missing  uniqueMean
  6   0   6   293.7

  92 105 211 292 511 551
Frequency  1   1   1   1   1   1
% 17  17  17  17  17  17

 ?lung

So status==1 is a censored case and the observed times are status==2
 describe(lung[lung$status==2lung$ph.ecog==2,time])
lung[lung$status == 2  lung$ph.ecog == 2, time]
  n missing  uniqueMean .05 .10 .25 .50 . 
75 .90 .95
 44   1  44   226.0   14.95   36.90   94.50  178.50   
295.75  500.00  635.85


lowest :  11  12  13  26  30, highest: 524 533 654 707 814

And the mean time to death (in a group that had only 6 censored  
individual at times from 92 to 551)  was 226 and median time to death  
among 44 individuals is 178 with a right skewed distribution. You need  
to decide whether you want to make that particular prediction when you  
know that you forced a specific distributional form on the regression  
machinery by accepting the default.





lfit - survreg(Surv(time, status) ~ ph.ecog, data=lung)
ptime - predict(lfit, newdata=data.frame(ph.ecog=2), type='response')



On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
james.whan...@gmail.comwrote:


Michael,

You are looking to compute an estimated time to death -- rather  
than the
odds of death conditional upon time.  Thus, you will want to use  
time to

death as your dependent variable rather than a dichotomous outcome (
0=alive, 1=death).   You can accomplish this with a straight forward
regression analysis.

Best,

Jim

On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein haenl...@escpeurope.eu 
wrote:



Dear all,

I'm struggling with predicting expected time until death for a  
coxph and

survreg model.

I have two datasets. Dataset 1 includes a certain number of people  
for

which
I know a vector of covariates (age, gender, etc.) and their event  
times
(i.e., I know whether they have died and when if death occurred  
prior to

the
end of the observation period). Dataset 2 includes another set of  
people

for
which I only have the covariate vector. I would like to use  
Dataset 1 to

calibrate either a coxph or survreg model and then use this model to
determine an expected time until death for the individuals in  
Dataset 2.
For example, I would like to know when a person in Dataset 2 will  
die,

given
his/ her age and gender.

I checked predict.coxph and predict.survreg as well as the  
document A
Package for Survival Analysis in S written by Terry M. Therneau  
but I

have
to admit that I'm a bit lost here.

Could anyone give me some advice on how this could be done?

Thanks very much in advance,

Michael



Michael Haenlein
Professor of Marketing



David Winsemius, MD
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] predict.coxph and predict.survreg

2010-11-11 Thread Michael Haenlein
David, Mattia, James -- thanks so much for all your helpful comments!
I now have a much better understanding of how to calculate what I'm
interested in ... and what the risks are of doing so.
Thanks and all the best,
Michael


On Thu, Nov 11, 2010 at 7:33 PM, David Winsemius dwinsem...@comcast.netwrote:


 On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote:

  Thanks for the comment, James!

 The problem is that my initial sample (Dataset 1) is truncated. That means
 I
 only observe time to death for those individuals who actually died
 before
 end of my observation period. It is my understanding that this type of
 truncation creates a bias when I use a normal regression analysis. Hence
 my idea to use some form of survival model.

 I had another look at predict.survreg and I think the option response
 could work for me.
 When I run the following code I get ptime = 290.3648.
 I assume this means that an individual with ph.ecog=2 can be expected to
 life another 290.3648 days before death occurs [days is the time scale of
 the time variable).


 It is a prediction under specific assumptions underpinning a parametric
 estimate.


  Could someone confirm whether this makes sense?


 You ought to confirm that it makes sense by comparing to your data:
 reauire(Hmisc); require(survival)
 your code

  describe(lung[lung$status==1lung$ph.ecog==2,time])
 lung[lung$status == 1  lung$ph.ecog == 2, time]
  n missing  uniqueMean
  6   0   6   293.7

  92 105 211 292 511 551
 Frequency  1   1   1   1   1   1
 % 17  17  17  17  17  17

  ?lung

 So status==1 is a censored case and the observed times are status==2
  describe(lung[lung$status==2lung$ph.ecog==2,time])
 lung[lung$status == 2  lung$ph.ecog == 2, time]
  n missing  uniqueMean .05 .10 .25 .50 .75
 .90 .95
 44   1  44   226.0   14.95   36.90   94.50  178.50  295.75
  500.00  635.85

 lowest :  11  12  13  26  30, highest: 524 533 654 707 814

 And the mean time to death (in a group that had only 6 censored individual
 at times from 92 to 551)  was 226 and median time to death among 44
 individuals is 178 with a right skewed distribution. You need to decide
 whether you want to make that particular prediction when you know that you
 forced a specific distributional form on the regression machinery by
 accepting the default.




 lfit - survreg(Surv(time, status) ~ ph.ecog, data=lung)
 ptime - predict(lfit, newdata=data.frame(ph.ecog=2), type='response')



 On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger
 james.whan...@gmail.comwrote:

  Michael,

 You are looking to compute an estimated time to death -- rather than the
 odds of death conditional upon time.  Thus, you will want to use time to
 death as your dependent variable rather than a dichotomous outcome (
 0=alive, 1=death).   You can accomplish this with a straight forward
 regression analysis.

 Best,

 Jim

 On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein 
 haenl...@escpeurope.euwrote:

  Dear all,

 I'm struggling with predicting expected time until death for a coxph
 and
 survreg model.

 I have two datasets. Dataset 1 includes a certain number of people for
 which
 I know a vector of covariates (age, gender, etc.) and their event times
 (i.e., I know whether they have died and when if death occurred prior to
 the
 end of the observation period). Dataset 2 includes another set of people
 for
 which I only have the covariate vector. I would like to use Dataset 1 to
 calibrate either a coxph or survreg model and then use this model to
 determine an expected time until death for the individuals in Dataset
 2.
 For example, I would like to know when a person in Dataset 2 will die,
 given
 his/ her age and gender.

 I checked predict.coxph and predict.survreg as well as the document A
 Package for Survival Analysis in S written by Terry M. Therneau but I
 have
 to admit that I'm a bit lost here.

 Could anyone give me some advice on how this could be done?

 Thanks very much in advance,

 Michael



 Michael Haenlein
 Professor of Marketing



 David Winsemius, MD
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.