[R] predict.coxph and predict.survreg
Dear all, I'm struggling with predicting expected time until death for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an expected time until death for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document A Package for Survival Analysis in S written by Terry M. Therneau but I have to admit that I'm a bit lost here. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote: Dear all, I'm struggling with predicting expected time until death for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an expected time until death for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document A Package for Survival Analysis in S written by Terry M. Therneau but I have to admit that I'm a bit lost here. The first step would be creating a Surv-object, followed by running a regression that created a coxph-object, using dataset1 as input. So you should be looking at: ?Surv ?coxph There are worked examples in the help pages. You would then run predict() on the coxph fit with dataset2 as the newdata argument. The default output is the linear predictor for the log-hazard relative to a mean survival estimate but other sorts of estimates are possible. The survfit function provides survival curve suitable for plotting. (You may want to inquire at a local medical school to find statisticians who have experience with this approach. This is ordinary biostatistics these days.) -- David. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
Indeed, from the predict() function of the coxph you cannot get directly time predictions, but only linear and exponential risk scores. This is because, in order to get the time, a baseline hazard has to be computed and it is not straightforward since it is implicit in the Cox model. 2010/11/11 David Winsemius dwinsem...@comcast.net: On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote: Dear all, I'm struggling with predicting expected time until death for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an expected time until death for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document A Package for Survival Analysis in S written by Terry M. Therneau but I have to admit that I'm a bit lost here. The first step would be creating a Surv-object, followed by running a regression that created a coxph-object, using dataset1 as input. So you should be looking at: ?Surv ?coxph There are worked examples in the help pages. You would then run predict() on the coxph fit with dataset2 as the newdata argument. The default output is the linear predictor for the log-hazard relative to a mean survival estimate but other sorts of estimates are possible. The survfit function provides survival curve suitable for plotting. (You may want to inquire at a local medical school to find statisticians who have experience with this approach. This is ordinary biostatistics these days.) -- David. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
Thanks very much for your answers, David and Mattia. I understand that the baseline hazard in a Cox model is unknown and that this makes the calculation of expected survival difficult. Does this change when I move to a survreg model instead? I think I'm OK with estimating a Cox model (or a survreg model) as I've done so in the past. But I'm lost with the different options in the prediction part (e.g., linear, quantile, risk, expected, ...). Is there any document that can provide an explanation what these options mean? Sorry in case these questions are naive ... hope they're not too stupd ;-) On Thu, Nov 11, 2010 at 5:03 PM, Mattia Prosperi ahn...@gmail.com wrote: Indeed, from the predict() function of the coxph you cannot get directly time predictions, but only linear and exponential risk scores. This is because, in order to get the time, a baseline hazard has to be computed and it is not straightforward since it is implicit in the Cox model. 2010/11/11 David Winsemius dwinsem...@comcast.net: On Nov 11, 2010, at 3:44 AM, Michael Haenlein wrote: Dear all, I'm struggling with predicting expected time until death for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an expected time until death for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document A Package for Survival Analysis in S written by Terry M. Therneau but I have to admit that I'm a bit lost here. The first step would be creating a Surv-object, followed by running a regression that created a coxph-object, using dataset1 as input. So you should be looking at: ?Surv ?coxph There are worked examples in the help pages. You would then run predict() on the coxph fit with dataset2 as the newdata argument. The default output is the linear predictor for the log-hazard relative to a mean survival estimate but other sorts of estimates are possible. The survfit function provides survival curve suitable for plotting. (You may want to inquire at a local medical school to find statisticians who have experience with this approach. This is ordinary biostatistics these days.) -- David. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
Michael, You are looking to compute an estimated time to death -- rather than the odds of death conditional upon time. Thus, you will want to use time to death as your dependent variable rather than a dichotomous outcome ( 0=alive, 1=death). You can accomplish this with a straight forward regression analysis. Best, Jim On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein haenl...@escpeurope.euwrote: Dear all, I'm struggling with predicting expected time until death for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an expected time until death for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document A Package for Survival Analysis in S written by Terry M. Therneau but I have to admit that I'm a bit lost here. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *James C. Whanger Research Consultant 2 Wolf Ridge Gap Ledyard, CT 06339 Phone: 860.389.0414* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
Thanks for the comment, James! The problem is that my initial sample (Dataset 1) is truncated. That means I only observe time to death for those individuals who actually died before end of my observation period. It is my understanding that this type of truncation creates a bias when I use a normal regression analysis. Hence my idea to use some form of survival model. I had another look at predict.survreg and I think the option response could work for me. When I run the following code I get ptime = 290.3648. I assume this means that an individual with ph.ecog=2 can be expected to life another 290.3648 days before death occurs [days is the time scale of the time variable). Could someone confirm whether this makes sense? lfit - survreg(Surv(time, status) ~ ph.ecog, data=lung) ptime - predict(lfit, newdata=data.frame(ph.ecog=2), type='response') On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger james.whan...@gmail.comwrote: Michael, You are looking to compute an estimated time to death -- rather than the odds of death conditional upon time. Thus, you will want to use time to death as your dependent variable rather than a dichotomous outcome ( 0=alive, 1=death). You can accomplish this with a straight forward regression analysis. Best, Jim On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein haenl...@escpeurope.euwrote: Dear all, I'm struggling with predicting expected time until death for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an expected time until death for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document A Package for Survival Analysis in S written by Terry M. Therneau but I have to admit that I'm a bit lost here. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing ESCP Europe Paris, France [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- *James C. Whanger Research Consultant 2 Wolf Ridge Gap Ledyard, CT 06339 Phone: 860.389.0414* [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote: Thanks for the comment, James! The problem is that my initial sample (Dataset 1) is truncated. That means I only observe time to death for those individuals who actually died before end of my observation period. It is my understanding that this type of truncation creates a bias when I use a normal regression analysis. Hence my idea to use some form of survival model. I had another look at predict.survreg and I think the option response could work for me. When I run the following code I get ptime = 290.3648. I assume this means that an individual with ph.ecog=2 can be expected to life another 290.3648 days before death occurs [days is the time scale of the time variable). It is a prediction under specific assumptions underpinning a parametric estimate. Could someone confirm whether this makes sense? You ought to confirm that it makes sense by comparing to your data: reauire(Hmisc); require(survival) your code describe(lung[lung$status==1lung$ph.ecog==2,time]) lung[lung$status == 1 lung$ph.ecog == 2, time] n missing uniqueMean 6 0 6 293.7 92 105 211 292 511 551 Frequency 1 1 1 1 1 1 % 17 17 17 17 17 17 ?lung So status==1 is a censored case and the observed times are status==2 describe(lung[lung$status==2lung$ph.ecog==2,time]) lung[lung$status == 2 lung$ph.ecog == 2, time] n missing uniqueMean .05 .10 .25 .50 . 75 .90 .95 44 1 44 226.0 14.95 36.90 94.50 178.50 295.75 500.00 635.85 lowest : 11 12 13 26 30, highest: 524 533 654 707 814 And the mean time to death (in a group that had only 6 censored individual at times from 92 to 551) was 226 and median time to death among 44 individuals is 178 with a right skewed distribution. You need to decide whether you want to make that particular prediction when you know that you forced a specific distributional form on the regression machinery by accepting the default. lfit - survreg(Surv(time, status) ~ ph.ecog, data=lung) ptime - predict(lfit, newdata=data.frame(ph.ecog=2), type='response') On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger james.whan...@gmail.comwrote: Michael, You are looking to compute an estimated time to death -- rather than the odds of death conditional upon time. Thus, you will want to use time to death as your dependent variable rather than a dichotomous outcome ( 0=alive, 1=death). You can accomplish this with a straight forward regression analysis. Best, Jim On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein haenl...@escpeurope.eu wrote: Dear all, I'm struggling with predicting expected time until death for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an expected time until death for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document A Package for Survival Analysis in S written by Terry M. Therneau but I have to admit that I'm a bit lost here. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing David Winsemius, MD West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] predict.coxph and predict.survreg
David, Mattia, James -- thanks so much for all your helpful comments! I now have a much better understanding of how to calculate what I'm interested in ... and what the risks are of doing so. Thanks and all the best, Michael On Thu, Nov 11, 2010 at 7:33 PM, David Winsemius dwinsem...@comcast.netwrote: On Nov 11, 2010, at 12:14 PM, Michael Haenlein wrote: Thanks for the comment, James! The problem is that my initial sample (Dataset 1) is truncated. That means I only observe time to death for those individuals who actually died before end of my observation period. It is my understanding that this type of truncation creates a bias when I use a normal regression analysis. Hence my idea to use some form of survival model. I had another look at predict.survreg and I think the option response could work for me. When I run the following code I get ptime = 290.3648. I assume this means that an individual with ph.ecog=2 can be expected to life another 290.3648 days before death occurs [days is the time scale of the time variable). It is a prediction under specific assumptions underpinning a parametric estimate. Could someone confirm whether this makes sense? You ought to confirm that it makes sense by comparing to your data: reauire(Hmisc); require(survival) your code describe(lung[lung$status==1lung$ph.ecog==2,time]) lung[lung$status == 1 lung$ph.ecog == 2, time] n missing uniqueMean 6 0 6 293.7 92 105 211 292 511 551 Frequency 1 1 1 1 1 1 % 17 17 17 17 17 17 ?lung So status==1 is a censored case and the observed times are status==2 describe(lung[lung$status==2lung$ph.ecog==2,time]) lung[lung$status == 2 lung$ph.ecog == 2, time] n missing uniqueMean .05 .10 .25 .50 .75 .90 .95 44 1 44 226.0 14.95 36.90 94.50 178.50 295.75 500.00 635.85 lowest : 11 12 13 26 30, highest: 524 533 654 707 814 And the mean time to death (in a group that had only 6 censored individual at times from 92 to 551) was 226 and median time to death among 44 individuals is 178 with a right skewed distribution. You need to decide whether you want to make that particular prediction when you know that you forced a specific distributional form on the regression machinery by accepting the default. lfit - survreg(Surv(time, status) ~ ph.ecog, data=lung) ptime - predict(lfit, newdata=data.frame(ph.ecog=2), type='response') On Thu, Nov 11, 2010 at 5:26 PM, James C. Whanger james.whan...@gmail.comwrote: Michael, You are looking to compute an estimated time to death -- rather than the odds of death conditional upon time. Thus, you will want to use time to death as your dependent variable rather than a dichotomous outcome ( 0=alive, 1=death). You can accomplish this with a straight forward regression analysis. Best, Jim On Thu, Nov 11, 2010 at 3:44 AM, Michael Haenlein haenl...@escpeurope.euwrote: Dear all, I'm struggling with predicting expected time until death for a coxph and survreg model. I have two datasets. Dataset 1 includes a certain number of people for which I know a vector of covariates (age, gender, etc.) and their event times (i.e., I know whether they have died and when if death occurred prior to the end of the observation period). Dataset 2 includes another set of people for which I only have the covariate vector. I would like to use Dataset 1 to calibrate either a coxph or survreg model and then use this model to determine an expected time until death for the individuals in Dataset 2. For example, I would like to know when a person in Dataset 2 will die, given his/ her age and gender. I checked predict.coxph and predict.survreg as well as the document A Package for Survival Analysis in S written by Terry M. Therneau but I have to admit that I'm a bit lost here. Could anyone give me some advice on how this could be done? Thanks very much in advance, Michael Michael Haenlein Professor of Marketing David Winsemius, MD West Hartford, CT [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.