[R] Survival Analysis questions
Dear all,I'm a newbie to Survival Analysis and I have some questions about the functions in the survival package.First, what's the difference between survreg and coxph? Say fit1<- survreg(survObj ~ var1 + var2 + var3, dist= 'whatever') and fit2<- coxph(survObj ~ var1 + var2 + var3), what's the difference between fit1 and fit2? They all do regression on explanatory variables. What does it mean dist in survreg?Second, what's the difference in these two specifications of the proportional hazard model?coxph(survObj ~ var1 + strata(var2))coxph(survObj ~ var1*strata(var2))Thanx,Pierre _ [[elided Hotmail spam]] [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis and censoring
In your particular case I don't think that censoring is an issue, at least not for the reason that you discuss. The basic censoring assumption in the Cox model is that subjects who are censored have the same future risk as those who were a. not censored and b. have the same covariates. The real problem with informative censoring are the covaraites that are not in the model; ones that I likely don't even know exist. Assume for instance that some unknown exposure X, Perth sunlight say, makes people much more likely to get both of the outcomes. Assume further that it matters, i.e., the study includes a reasonable number of people with and without this exposure. Then someone who has an early heart attack actually has a higher risk of colorectal cancer than a colleague of the same age/sex/followup who did not have a heart attack, the reason being that the HA guy is more likely to be from Perth. Your simulation went wrong by not actually accounting for time. You created an outcome table for CC & HD and added a random time vector to it. If someone would have had CC at 2 years and now has HD at 1 year, you can't just change the status to make them censored at 2. The gambling analogy would be kicking someone out of the casino just before they win -- it does odd things to the odds. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis and censoring
Dear Prof. Therneau, Many thanks for this, On 3/13/08, Terry Therneau <[EMAIL PROTECTED]> wrote: > > In your particular case I don't think that censoring is an issue, at least > not > for the reason that you discuss. The basic censoring assumption in the Cox > model is that subjects who are censored have the same future risk as those > who > were a. not censored and b. have the same covariates. >The real problem with informative censoring are the covaraites that are not > in the model; ones that I likely don't even know exist. Assume for instance > that some unknown exposure X, Perth sunlight say, makes people much more > likely > to get both of the outcomes. Assume further that it matters, i.e., the study > includes a reasonable number of people with and without this exposure. Then > someone who has an early heart attack actually has a higher risk of > colorectal > cancer than a colleague of the same age/sex/followup who did not have a heart > attack, the reason being that the HA guy is more likely to be from Perth. > >Your simulation went wrong by not actually accounting for time. You > created > an outcome table for CC & HD and added a random time vector to it. If > someone > would have had CC at 2 years and now has HD at 1 year, you can't just change > the > status to make them censored at 2. The gambling analogy would be kicking > someone out of the casino just before they win -- it does odd things to the > odds. I'm still astonished that this is the explanation, but I've spent an hour playing with my little R code model and this is exactly the problem. Score 1 for solid maths and 0 for my intuition. Many Thanks, Geoff > >Terry Therneau > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survival Analysis with two different events
Hello all, I am hoping to use survival analysis to examine whether parasite attack increases nest death in a species of social wasp. I therefore have data for 1. Whether the nest "died" in the 6 week census period ("Status", where 1=died, 0=survived) 2. The day number of death/last recorded day it was observed alive. 3. Whether the nest was attacked by the parasite (0/1 as with 1.) 4. The day number of attack/ last recorded day the nest was observed without a parasite. i.e. example dataset: status death para paraday 0 42 0 42 1 32 0 42 1 25 1 13 0 42 1 25 ... I've looked over r-help, as well as in Crawley etc., but I have yet to find a solution. Can anyone point me in the right direction or literature? Many thanks, Edd Almond -- View this message in context: http://www.nabble.com/Survival-Analysis-with-two-different-events-tp18183526p18183526.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival Analysis with two different events
sickboyedd gmail.com> writes: > > > Hello all, > > I am hoping to use survival analysis to examine whether parasite attack > increases nest death in a species of social wasp. I therefore have data for > > 1. Whether the nest "died" in the 6 week census period ("Status", where > 1=died, 0=survived) > 2. The day number of death/last recorded day it was observed alive. > 3. Whether the nest was attacked by the parasite (0/1 as with 1.) > 4. The day number of attack/ last recorded day the nest was observed without > a parasite. > > i.e. example dataset: > > status death para paraday > 0 42 0 42 > 1 32 0 42 > 1 25 1 13 > 0 42 1 25 ... > > I've looked over r-help, as well as in Crawley etc., but I have yet to find > a solution. Can anyone point me in the right direction or literature? > You might want to send this to r-sig-ecology if you need further discussion. In the meantime, the very simplest thing (conditioning on whether the nest was attacked or not) would be library(survival) c1 = coxph(Surv(death,status)~para,data=mydata) (you should definitely read up a bit on survival analysis, Cox proportional hazards, etc.. I think there's a chapter in the book by Scheiner and Gurevitch, geared towards ecologists). Dealing with parasite attack in a more fine-grained way (i.e. assessing mortality before and after parasitism) would be a little trickier, but I wouldn't worry about it until after you've understood the first stage of the analysis. Ben Bolker __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival Analysis with two different events
sickboyedd gmail.com> writes: > > > Hello all, > > I am hoping to use survival analysis to examine whether parasite attack > increases nest death in a species of social wasp. I therefore have data for > > 1. Whether the nest "died" in the 6 week census period ("Status", where > 1=died, 0=survived) > 2. The day number of death/last recorded day it was observed alive. > 3. Whether the nest was attacked by the parasite (0/1 as with 1.) > 4. The day number of attack/ last recorded day the nest was observed without > a parasite. > > i.e. example dataset: > > status death para paraday > 0 42 0 42 > 1 32 0 42 > 1 25 1 13 > 0 42 1 25 ... > > I've looked over r-help, as well as in Crawley etc., but I have yet to find > a solution. Can anyone point me in the right direction or literature? > The classic solution in biomedical work is a time-dependent covariate. Create a new data set like this: time1 time2 status parasite 042 0 0 032 1 0 013 0 0 13 25 1 1 ... The key is lines 3 and 4, which show the colony parasite free from day 0 to 13, and with parasite from day 13 to 25. Then one uses a Cox model with fit <- coxph(Surv(time1, time2, status) ~ parasite) summary(fit) It estimates the increase in death rate with parasite versus no parasite. These models were originally developed for treatment regimens that change over time. A given colony (subject) can have as many lines of data as you want, subject to the fact that the time intervals can't overlap (which would correspond to two copies of the same person alive at the same time). The status variable for a multi-line dataset =1 if THIS interval ends with an event. Look at the survival analysis chapter of Venables & Ripley, Modern Applied Statistics with S, for further insight. (or many other books) Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survival analysis with no events in one treatment group
I'm trying to fit a Cox proportional hazards model to some hospital admission data. About 25% of the patients have had at least one admission, and of these, 40% have had two admissions within the 12 month period of the study. Each patients has had one of 4 treatments, and one of the treatment groups has had no admissions for the period. I used: surv.obj<-Surv(time=time1,time2=time2,event=event,type="counting") model<-coxph(surv.obj~Treatment+cluster(Subject)) and, as explained in the coxph help page, I get a warning message about convergence because the MLE of one of the coefficients is infinite since there are no admissions in one group. I'm looking for suggestions about how to proceed with an analysis of these data. I'd prefer not to ignore the fact that there are multiple admissions, but any alternative ideas I have at the moment do this. Many thanks, John Field = Faculty of Health Sciences Statistical Support Service The University of Adelaide, Australia 5005 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival analysis with no events in one treatment group
Hi John, I am on the slow side - can you provide sample code. How can one treatment group have no admissions? Let's say there are treatments W, X, Y, Z. Do you mean that NONE of the patients who got admitted the first time and, say, received treatment X during the first admission, have ever had a second admission (in your data). And for the other treatments W, Y, and Z some of those who got admitted the first time came in a second time? Cheers and a happy new year's eve, Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von John Field Gesendet: Sunday, December 30, 2007 9:16 PM An: R-help@r-project.org Betreff: [R] Survival analysis with no events in one treatment group I'm trying to fit a Cox proportional hazards model to some hospital admission data. About 25% of the patients have had at least one admission, and of these, 40% have had two admissions within the 12 month period of the study. Each patients has had one of 4 treatments, and one of the treatment groups has had no admissions for the period. I used: surv.obj<-Surv(time=time1,time2=time2,event=event,type="counting") model<-coxph(surv.obj~Treatment+cluster(Subject)) and, as explained in the coxph help page, I get a warning message about convergence because the MLE of one of the coefficients is infinite since there are no admissions in one group. I'm looking for suggestions about how to proceed with an analysis of these data. I'd prefer not to ignore the fact that there are multiple admissions, but any alternative ideas I have at the moment do this. Many thanks, John Field = Faculty of Health Sciences Statistical Support Service The University of Adelaide, Australia 5005 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival analysis with no events in one treatment group
Hi again, I meant some rows of sample data, rather than sample code. Sorry about that. Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag von John Field Gesendet: Sunday, December 30, 2007 9:16 PM An: R-help@r-project.org Betreff: [R] Survival analysis with no events in one treatment group I'm trying to fit a Cox proportional hazards model to some hospital admission data. About 25% of the patients have had at least one admission, and of these, 40% have had two admissions within the 12 month period of the study. Each patients has had one of 4 treatments, and one of the treatment groups has had no admissions for the period. I used: surv.obj<-Surv(time=time1,time2=time2,event=event,type="counting") model<-coxph(surv.obj~Treatment+cluster(Subject)) and, as explained in the coxph help page, I get a warning message about convergence because the MLE of one of the coefficients is infinite since there are no admissions in one group. I'm looking for suggestions about how to proceed with an analysis of these data. I'd prefer not to ignore the fact that there are multiple admissions, but any alternative ideas I have at the moment do this. Many thanks, John Field = Faculty of Health Sciences Statistical Support Service The University of Adelaide, Australia 5005 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival analysis with no events in one treatment group
Hi Daniel, Sorry, it may have been clearer if I had used "subjects" instead of "patients". The treatments were administered to all subjects, and then in the succeeding 12 months, some were hospitalised and some were not. Hence only about 25% of the subjects were hospitalised. The start of the data: subjtime1 time2 event Treat 1 0 6.2 1 A 1 6.2 12 0 A 2 0 12 0 A so subject 1 was hospitalised at 6.2 months, subject 2 not at all. Hope this makes it clearer. Regards, John .At 12:59 PM 31/12/2007, Daniel Malter wrote: >Hi John, > >I am on the slow side - can you provide sample code. How can one treatment >group have no admissions? > >Let's say there are treatments W, X, Y, Z. Do you mean that NONE of the >patients who got admitted the first time and, say, received treatment X >during the first admission, have ever had a second admission (in your data). >And for the other treatments W, Y, and Z some of those who got admitted the >first time came in a second time? > >Cheers and a happy new year's eve, >Daniel > >- >cuncta stricte discussurus >- > >-Ursprüngliche Nachricht- >Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im >Auftrag von John Field >Gesendet: Sunday, December 30, 2007 9:16 PM >An: R-help@r-project.org >Betreff: [R] Survival analysis with no events in one treatment group > >I'm trying to fit a Cox proportional hazards model to some hospital >admission data. About 25% of the patients have had at least one admission, >and of these, 40% have had two admissions within the 12 month period of the >study. Each patients has had one of 4 treatments, and one of the treatment >groups has had no admissions for the period. I used: > >surv.obj<-Surv(time=time1,time2=time2,event=event,type="counting") >model<-coxph(surv.obj~Treatment+cluster(Subject)) > >and, as explained in the coxph help page, I get a warning message about >convergence because the MLE of one of the coefficients is infinite since >there are no admissions in one group. > >I'm looking for suggestions about how to proceed with an analysis of these >data. I'd prefer not to ignore the fact that there are multiple admissions, >but any alternative ideas I have at the moment do this. > >Many thanks, >John Field >= >Faculty of Health Sciences Statistical Support Service The University of >Adelaide, Australia 5005 > >__ >R-help@r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival analysis with no events in one treatment group
Hi John, to me it still seems that you have two "problems", given your first email. The first one, if I am correct, is that you have NO admissions in one of the groups. That is, you have Treat A, B, C, D and for one of these treatments there is not a single admission. Then you cannot estimate a survival function for this group because nobody died (got hospitalized). I think you have to exclude this group because your survival rate is 100% and your hazard rate 0%. But more proficient people may have better advice with that. That subject 2 did not get hospitalized is not a problem though. The observation is censored at 12 months. Since there are more patients in his treatment group (subject 1) who had an event, the survival function can be estimated. To the multiple admissions problem: To account for the fact that some have more than one event you may create a "number of events" variable, say numEvents, and include that in a strata() argument to your regression call: surv.obj<-Surv(time=time1,time2=time2,event=event,type="counting") model<-coxph(surv.obj~Treatment+strata(numEvents)) Use strata if you think that your baseline hazard is different in the different strata (in your case: if you think that the baseline hazard of having an event differs with having had prior event(s)). At the same time you assume that your treatment effect - the beta on your Treat variable - is the same across all strata. If you have reason to assume that the effect of your treatment varies (interacts) with the number of prior events, then it is not the correct approach. In addition you may include the cluster(Subject) or frailty(Subject) commands. Instead of strata you might also consider using a dummy variable, coding the number of prior events (if the maximum number admissions per patient is reasonably small and the number of cell frequencies reasonably large). Finally, you may want to consult Terry Thernau and Patricia Grambsch's book "Modeling survival data". It shows how to apply the techniques in R/S-Plus. I find it invaluable. Does that help you? Daniel - cuncta stricte discussurus - -Ursprüngliche Nachricht- Von: John Field [mailto:[EMAIL PROTECTED] Gesendet: Sunday, December 30, 2007 10:14 PM An: Daniel Malter; R-help@r-project.org Betreff: Re: AW: [R] Survival analysis with no events in one treatment group Hi Daniel, Sorry, it may have been clearer if I had used "subjects" instead of "patients". The treatments were administered to all subjects, and then in the succeeding 12 months, some were hospitalised and some were not. Hence only about 25% of the subjects were hospitalised. The start of the data: subjtime1 time2 event Treat 1 0 6.2 1 A 1 6.2 12 0 A 2 0 12 0 A so subject 1 was hospitalised at 6.2 months, subject 2 not at all. Hope this makes it clearer. Regards, John .At 12:59 PM 31/12/2007, Daniel Malter wrote: >Hi John, > >I am on the slow side - can you provide sample code. How can one >treatment group have no admissions? > >Let's say there are treatments W, X, Y, Z. Do you mean that NONE of the >patients who got admitted the first time and, say, received treatment X >during the first admission, have ever had a second admission (in your data). >And for the other treatments W, Y, and Z some of those who got admitted >the first time came in a second time? > >Cheers and a happy new year's eve, >Daniel > >- >cuncta stricte discussurus >- > >-Ursprüngliche Nachricht- >Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] >Im Auftrag von John Field >Gesendet: Sunday, December 30, 2007 9:16 PM >An: R-help@r-project.org >Betreff: [R] Survival analysis with no events in one treatment group > >I'm trying to fit a Cox proportional hazards model to some hospital >admission data. About 25% of the patients have had at least one >admission, and of these, 40% have had two admissions within the 12 >month period of the study. Each patients has had one of 4 treatments, >and one of the treatment groups has had no admissions for the period. I used: > >surv.obj<-Surv(time=time1,time2=time2,event=event,type="counting") >model<-coxph(surv.obj~Treatment+cluster(Subject)) > >and, as explained in the coxph help page, I get a warning message about >convergence because the MLE of one of the coefficients is infinite >since there are no admissions in one group. > >I'm looking for suggestions about how to proceed with an analysis of >these data. I'd prefer not to ignore the fact that there are multiple >admissions, but any alternative ideas I have at the moment do this. > >Many thanks
Re: [R] Survival analysis with no events in one treatment group
John, The key issue with a data set that has no events in one group is that the usual Wald tests, i.e., beta/se(beta), do not work. They is based on a Taylor series argument of the usual type: f(x) = f(x0) + a polynomial in (x-x0), and for infinite beta "x" and "x0" are so far apart that the approximation isn't even close. The score and likelihood ratio statistics are just fine, however. So if you completely ignore any printout that involves the "infinite beta" columns of the estimated variance matrix all should be well. In your case, a second issue is that the likelihood ratio test is not valid for data sets with multiple events per subject. You have to account for the within subject correlation in some way, either by moving to a random effects model or a gee type variance. You have done the latter by adding "cluster" to your coxph call. Thus, only the "robust score test" line of your final output is reliable. The LR test is tainted by multiple events, and the robust Wald by the infinite beta. Use summary() to print all the test statistics. Assume you have variables x1 and x2 in a model, and want to test the importance of adding x3. Stepwise regression programs often use score tests for this, but most users have never done it and don't know how. > fit1 <- coxph(Surv(time1, time2, event) ~ x1 + x2 + cluster(subject), subset= (!is.na(x3))) > fit2 <- coxph(Surv(time1, time2, event) ~ x1 + x2 + x3 + cluster(subject), init=c(coef(fit1), 0)) Now the "robust score test" in fit2 is a test of (coef1, coef2, 0) [no need for x3] vs the model with x3. If you have no missing values you can skip the subset argument, but make sure that the two fits are actually on the same data set: always check that the "n=" part of the printout matches. Terry Therneau Mayo Clinic __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] comparing SAS and R survival analysis with time-dependent covariates
Dear R-help, I was comparing SAS (I do not know what version it is) and R (version 2.6.0 (2007-10-03) on Linux) survival analyses with time-dependent covariates. The results differed significantly so I tried to understand on a short example where I went wrong. The following example shows that even when argument 'method' in R function coxph and argument 'ties' in SAS procedure phreg are the same, the results of Cox regr. are different. This seems to happen when there are ties in the events/covariates times. My question is what software, R or SAS, is more reliable for the survival analysis with time-dependent covariates or if you could point out a problem in the following example. Example. SAS gives HR=3.236: data trythis; input id days timedeli stat; datalines; 13.5 1 2 1.51 1 36 1000 0 48 1000 1 58 1 0 621 1000 1 7113 1 run; proc phreg data=trythis; model days*stat(0)=deli/risklimits ties=exact; if timedeli>days then deli=0; else deli=1; run; Example (continued). R gives HR=3.91: tmp = data.frame(id=c(1, 1, 2, 2, 3, 4, 5, 5, 6, 7, 7), start=c(0.0, 0.5, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 3.0), end=c(0.5, 3.0, 1.0, 1.5, 6.0, 8.0, 1.0, 8.0, 21.0, 3.0, 11.0), delir=c(0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1), outcome=c(0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1)) tmp surv = Surv(time=tmp$start, time2=tmp$end, tmp$outcome) cphres = coxph(surv ~ tmp$delir, method="exact") summary(cphres)[["coef"]] After breaking a tie b/w an event and a time-dependent observation, R gives the same result as SAS. tmp$end[2]=tmp$end[2] + .1 tmp surv = Surv(time=tmp$start, time2=tmp$end, tmp$outcome) cphres = coxph(surv ~ tmp$delir, method="exact") summary(cphres)[["coef"]] Thank you so much for time, Svetlana [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] survival analysis: time-variant covariates, right censored, re-current events
I have a data set for survival analysis in R. The data have time-variant covariates, and right-censored outcomes, which could also be re-current events. What R package should I use? Thanks! -- View this message in context: http://www.nabble.com/survival-analysis%3A-time-variant-covariates%2C-right-censored%2C-re-current-events-tp25083950p25083950.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing SAS and R survival analysis with time-dependent covariates
This query of "why do SAS and S give different answers for Cox models" comes up every so often. The two most common reasons are that a. they are using different options for the ties b. the SAS and S data sets are slightly different. You have both errors. First, make sure I have the same data set by reading a common file, and then compare the results. tmt54% more sdata.txt 1 0.0 0.5 0 0 1 0.5 3.0 1 1 2 0.0 1.0 0 0 2 1.0 1.5 1 1 3 0.0 6.0 0 0 4 0.0 8.0 0 1 5 0.0 1.0 0 0 5 1.0 8.0 1 0 6 0.0 21.0 0 1 7 0.0 3.0 0 0 7 3.0 11.0 1 1 tmt55% more test.sas options linesize=80; data trythis; infile 'sdata.txt'; input id start end delir outcome; proc phreg data=trythis; model (start, end)*outcome(0)=delir/ ties=discrete; proc phreg data=trythis; model (start, end)*outcome(0)=delir/ ties=efron; tmt56% more test.r trythis <- read.table('sdata.txt', col.names=c("id", "start", "end", "delir", "outcome")) coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='exact') coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='efron') - I now get comparable answers. Note that Cox's "exact partial likelihood" is the correct form to use for discrete time data. I labeled this as the 'exact' method and SAS as the 'discrete' method. The "exact marginal likelihood" of Prentice et al, which SAS calls the 'exact' method is not implemented in S. As to which package is more reliable, I can only point to a set of formal test cases that are found in Appendix E of the book by Therneau and Grambsch. These are small data sets where the coefficients, log-likelihood, residuals, etc have all been worked out exactly in closed form. R gets all of these test cases right, SAS gets almost all. Terry Therneau - Svetlan Eden wrote Dear R-help, I was comparing SAS (I do not know what version it is) and R (version 2.6.0 (2007-10-03) on Linux) survival analyses with time-dependent covariates. The results differed significantly so I tried to understand on a short example where I went wrong. The following example shows that even when argument 'method' in R function coxph and argument 'ties' in SAS procedure phreg are the same, the results of Cox regr. are different. This seems to happen when there are ties in the events/covariates times. My question is what software, R or SAS, is more reliable for the survival analysis with time-dependent covariates or if you could point out a problem in the following example. ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] comparing SAS and R survival analysis with time-dependent covariates
Thank you so much, this was very helpful. Svetlana Terry Therneau wrote: This query of "why do SAS and S give different answers for Cox models" comes up every so often. The two most common reasons are that a. they are using different options for the ties b. the SAS and S data sets are slightly different. You have both errors. First, make sure I have the same data set by reading a common file, and then compare the results. tmt54% more sdata.txt 1 0.0 0.5 0 0 1 0.5 3.0 1 1 2 0.0 1.0 0 0 2 1.0 1.5 1 1 3 0.0 6.0 0 0 4 0.0 8.0 0 1 5 0.0 1.0 0 0 5 1.0 8.0 1 0 6 0.0 21.0 0 1 7 0.0 3.0 0 0 7 3.0 11.0 1 1 tmt55% more test.sas options linesize=80; data trythis; infile 'sdata.txt'; input id start end delir outcome; proc phreg data=trythis; model (start, end)*outcome(0)=delir/ ties=discrete; proc phreg data=trythis; model (start, end)*outcome(0)=delir/ ties=efron; tmt56% more test.r trythis <- read.table('sdata.txt', col.names=c("id", "start", "end", "delir", "outcome")) coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='exact') coxph(Surv(start, end, outcome) ~ delir, data=trythis, ties='efron') - I now get comparable answers. Note that Cox's "exact partial likelihood" is the correct form to use for discrete time data. I labeled this as the 'exact' method and SAS as the 'discrete' method. The "exact marginal likelihood" of Prentice et al, which SAS calls the 'exact' method is not implemented in S. As to which package is more reliable, I can only point to a set of formal test cases that are found in Appendix E of the book by Therneau and Grambsch. These are small data sets where the coefficients, log-likelihood, residuals, etc have all been worked out exactly in closed form. R gets all of these test cases right, SAS gets almost all. Terry Therneau - Svetlan Eden wrote Dear R-help, I was comparing SAS (I do not know what version it is) and R (version 2.6.0 (2007-10-03) on Linux) survival analyses with time-dependent covariates. The results differed significantly so I tried to understand on a short example where I went wrong. The following example shows that even when argument 'method' in R function coxph and argument 'ties' in SAS procedure phreg are the same, the results of Cox regr. are different. This seems to happen when there are ties in the events/covariates times. My question is what software, R or SAS, is more reliable for the survival analysis with time-dependent covariates or if you could point out a problem in the following example. ... __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] survival analysis: time-variant covariates, right censored, re-current events
On Aug 21, 2009, at 1:16 PM, clue_less wrote: I have a data set for survival analysis in R. The data have time- variant covariates, and right-censored outcomes, which could also be re- current events. What R package should I use? Have you considered recasting the data into interval-censored form and using either Harrell's package:Design or Therneau's package:survival? -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Survival-Analysis: How to get numerical values from survfit (and not just a plot)?
Hi! I came across R just a few days ago since I was looking for a toolbox for cox-regression. I´ve read "Cox Proportional-Hazards Regression for Survival Data Appendix to An R and S-PLUS Companion to Applied Regression" from John Fox. As described therein plotting survival-functions works well (plot(survfit(model))). But I´d like to do some manipulation with the survival-functions before plotting them e.g. dividing one survival-function by another. survfit() only returns an object of the following structure n events median 0.9LCL 0.9UCL 55.000 55.000 1.033 0.696 1.637 Can you tell me how I can calculate a survival- or baseline-function out of these values and how I extract the values from the object? I´m sure the calculation is done by the corresponding plot-routine, but I couldn´t find that one either. Regards Bernhard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival-Analysis: How to get numerical values from survfit (and not just a plot)?
Hi, See ?survfit.object if fit is the object you get using survfit, fit$surv will give you the survival probability. Best, arthur Bernhard Reinhardt wrote: Hi! I came across R just a few days ago since I was looking for a toolbox for cox-regression. I´ve read "Cox Proportional-Hazards Regression for Survival Data Appendix to An R and S-PLUS Companion to Applied Regression" from John Fox. As described therein plotting survival-functions works well (plot(survfit(model))). But I´d like to do some manipulation with the survival-functions before plotting them e.g. dividing one survival-function by another. survfit() only returns an object of the following structure n events median 0.9LCL 0.9UCL 55.000 55.000 1.033 0.696 1.637 Can you tell me how I can calculate a survival- or baseline-function out of these values and how I extract the values from the object? I´m sure the calculation is done by the corresponding plot-routine, but I couldn´t find that one either. Regards Bernhard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival-Analysis: How to get numerical values from survfit (and not just a plot)?
I seriously doubt that a survfit object could only contain that information. I suspect that you are erroneously thinking that what print.survfit offers is the entire story. What does str(survfit(, data=) ) show you? > data(aml) > aml.mdl <- survfit(Surv(time, status) ~ x, data=aml) # this is with survfit.Design since I load Hmisc/Design by default now. > str(aml.mdl) List of 18 $ n: int 23 $ time : num [1:20] 9 13 18 23 28 31 34 45 48 161 ... $ n.risk : num [1:20] 11 10 8 7 6 5 4 3 2 1 ... $ n.event : num [1:20] 1 1 1 1 0 1 1 0 1 0 ... $ surv : num [1:20] 0.909 0.818 0.716 0.614 0.614 ... $ type : chr "right" $ ntimes.strata: Named int [1:2] 10 10 ..- attr(*, "names")= chr [1:2] "1" "2" $ strata : Named num [1:2] 10 10 ..- attr(*, "names")= chr [1:2] "Maintained" "Nonmaintained" $ strata.all : Named int [1:2] 11 12 ..- attr(*, "names")= chr [1:2] "Maintained" "Nonmaintained" $ std.err : num [1:20] 0.0953 0.1421 0.1951 0.2487 0.2487 ... $ upper: num [1:20] 0.987 0.951 0.899 0.835 0.835 ... $ lower: num [1:20] 0.508 0.447 0.35 0.266 0.266 ... $ conf.type: chr "log-log" $ conf.int : num 0.95 $ maxtime : num 161 $ units: chr "Day" $ time.label : chr "time" $ call : language survfit(formula = Surv(time, status) ~ x, data = aml) - attr(*, "class")= chr "survfit" > I also don't think survfit returns a Cox model. On Feb 17, 2009, at 10:37 AM, Bernhard Reinhardt wrote: Hi! I came across R just a few days ago since I was looking for a toolbox for cox-regression. I´ve read "Cox Proportional-Hazards Regression for Survival Data Appendix to An R and S-PLUS Companion to Applied Regression" from John Fox. As described therein plotting survival-functions works well (plot(survfit(model))). But I´d like to do some manipulation with the survival-functions before plotting them e.g. dividing one survival-function by another. survfit() only returns an object of the following structure n events median 0.9LCL 0.9UCL 55.000 55.000 1.033 0.696 1.637 Can you tell me how I can calculate a survival- or baseline-function out of these values and how I extract the values from the object? I ´m sure the calculation is done by the corresponding plot-routine, but I couldn´t find that one either. Regards Bernhard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Survival-Analysis: How to get numerical values from survfit (and not just a plot)?
Hi Bernhard, I'm wondering what you will expect to get in "dividing" two proportional survival curves from a fitted cox model. Anyway, you can provide a newdata object to the survfit function containing any combination of cofactors you are interested in and then use summary, eg: fit <- coxph( Surv(futime,fustat)~resid.ds+rx+ecog.ps,data=ovarian) summary(survfit( fit,newdata=data.frame(rx=1, ecog.ps=2, resid.ds=0))) hth. Bernhard Reinhardt schrieb: Hi! I came across R just a few days ago since I was looking for a toolbox for cox-regression. I´ve read "Cox Proportional-Hazards Regression for Survival Data Appendix to An R and S-PLUS Companion to Applied Regression" from John Fox. As described therein plotting survival-functions works well (plot(survfit(model))). But I´d like to do some manipulation with the survival-functions before plotting them e.g. dividing one survival-function by another. survfit() only returns an object of the following structure n events median 0.9LCL 0.9UCL 55.000 55.000 1.033 0.696 1.637 Can you tell me how I can calculate a survival- or baseline-function out of these values and how I extract the values from the object? I´m sure the calculation is done by the corresponding plot-routine, but I couldn´t find that one either. Regards Bernhard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Eik Vettorazzi Institut für Medizinische Biometrie und Epidemiologie Universitätsklinikum Hamburg-Eppendorf Martinistr. 52 20246 Hamburg T ++49/40/42803-8243 F ++49/40/42803-7790 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.