Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Thanks for the clarification Dr. Therneau. Until I learn more about this I can at least remember that "plain" is bad. Thanks, Paul --- On Mon, 4/16/12, Terry Therneau wrote: > From: Terry Therneau > Subject: Re: Kaplan Meier analysis: 95% CI wider in R than in SAS > To: r-help@r-project.org, "Paul Miller" > Received: Monday, April 16, 2012, 8:30 AM > On 04/14/2012 05:00 AM, r-help-requ...@r-project.org > wrote: > > Am replicating in R an analysis I did earlier using > SAS. See this as a test of whether I'm ready to start using > R in my day-to-day work. > > ? > > Just finished replicating a Kaplan Meier analysis. > Everything seems to work out fine except for one thing. The > 95% CI around my estimate for the median is substantially > larger in R than in SAS. For example, in SAS I have a median > of 3.29 with a 95% CI of [1.15, 5.29]. In R, I get a median > of 3.29 with a 95% CI of [1.35,?13.35]. > > ? > > Can anyone tell me why I get this difference? > > > > The confidence interval for the median is based on the > confidence intervals for the curves. There are several > methods for computing confidence intervals for the curves: > plain, log, log-log, or logit scale. There are > opinions on which is best, and it is a close race: except > for the first of these. The type "plain" intervals are > awful, it's like putting me in one lane of a championship > 100 meter dash. > > Until about version 9 the only option in SAS was "plain", > then for a time it was still the default. By 9.2 they > finally went to loglog. > > Terry Therneau > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
But it is very very unlikely to have a time with survival probability of 0. in a real data set. It would be necessary huge data set. A Monte Carlo simulation could put a little more light in this issue? 2012/4/16 Frank Harrell > Just generate some data where the estimated survival probability is 0. > at > a certain time. The log-log transformation blows up. > Frank > > Enrico Colosimo wrote > > > > What are the significant problems of the log-log transformations? > > Any papers published about it? > > Enrico. > > > > > > 2012/4/14 Frank Harrell> > > >> I used log-log in my book too until Terry Therneau alerted me to the > >> significant problems this creates. In the 2nd edition it will use log > >> S(t). > >> Frank > >> > >> Paul Miller wrote > >> > > >> > Hello Drs. Colosimo and Harrell, > >> > > >> > Thank you for your replies to my question. From Dr. Colosimo, I was > >> able > >> > to determine that the SAS results can be replicated by adding the > >> > option conf.type="log-log" to my code as in : > >> > > >> > survobj <- survfit(survfrm, conf.type="log-log", data=Survival) > >> > > >> > Originally, it looked like the SAS results could be replicated using > >> > conf.type="plain". Applying this option to my actual data revealed > that > >> > this was not the case, however. > >> > > >> >>From Dr. Harrell, I learned that using conf.type="log-log" may not be > >> such > >> a good idea. Interestingly though, I've seen at least one instance where > >> experts in the R community use this option in their book. The book is > >> about > >> 10 years old. So maybe opinion about the use of this option has shifted > >> since then. > >> > > >> > Thanks, > >> > > >> > Paul > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > > >> > __ > >> > R-help@ mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > > >> > >> > >> - > >> Frank Harrell > >> Department of Biostatistics, Vanderbilt University > >> -- > >> View this message in context: > >> > http://r.789695.n4.nabble.com/Kaplan-Meier-analysis-95-CI-wider-in-R-than-in-SAS-tp4554559p4557695.html > >> Sent from the R help mailing list archive at Nabble.com. > >> > >> __ > >> R-help@ mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@ mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > - > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: > http://r.789695.n4.nabble.com/Kaplan-Meier-analysis-95-CI-wider-in-R-than-in-SAS-tp4554559p4561432.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
On 04/14/2012 05:00 AM, r-help-requ...@r-project.org wrote: Am replicating in R an analysis I did earlier using SAS. See this as a test of whether I'm ready to start using R in my day-to-day work. ? Just finished replicating a Kaplan Meier analysis. Everything seems to work out fine except for one thing. The 95% CI around my estimate for the median is substantially larger in R than in SAS. For example, in SAS I have a median of 3.29 with a 95% CI of [1.15, 5.29]. In R, I get a median of 3.29 with a 95% CI of [1.35,?13.35]. ? Can anyone tell me why I get this difference? The confidence interval for the median is based on the confidence intervals for the curves. There are several methods for computing confidence intervals for the curves: plain, log, log-log, or logit scale. There are opinions on which is best, and it is a close race: except for the first of these. The type "plain" intervals are awful, it's like putting me in one lane of a championship 100 meter dash. Until about version 9 the only option in SAS was "plain", then for a time it was still the default. By 9.2 they finally went to loglog. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Just generate some data where the estimated survival probability is 0. at a certain time. The log-log transformation blows up. Frank Enrico Colosimo wrote > > What are the significant problems of the log-log transformations? > Any papers published about it? > Enrico. > > > 2012/4/14 Frank Harrell> >> I used log-log in my book too until Terry Therneau alerted me to the >> significant problems this creates. In the 2nd edition it will use log >> S(t). >> Frank >> >> Paul Miller wrote >> > >> > Hello Drs. Colosimo and Harrell, >> > >> > Thank you for your replies to my question. From Dr. Colosimo, I was >> able >> > to determine that the SAS results can be replicated by adding the >> > option conf.type="log-log" to my code as in : >> > >> > survobj <- survfit(survfrm, conf.type="log-log", data=Survival) >> > >> > Originally, it looked like the SAS results could be replicated using >> > conf.type="plain". Applying this option to my actual data revealed that >> > this was not the case, however. >> > >> >>From Dr. Harrell, I learned that using conf.type="log-log" may not be >> such >> a good idea. Interestingly though, I've seen at least one instance where >> experts in the R community use this option in their book. The book is >> about >> 10 years old. So maybe opinion about the use of this option has shifted >> since then. >> > >> > Thanks, >> > >> > Paul >> > >> > [[alternative HTML version deleted]] >> > >> > >> > __ >> > R-help@ mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> >> >> - >> Frank Harrell >> Department of Biostatistics, Vanderbilt University >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Kaplan-Meier-analysis-95-CI-wider-in-R-than-in-SAS-tp4554559p4557695.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@ mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Kaplan-Meier-analysis-95-CI-wider-in-R-than-in-SAS-tp4554559p4561432.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
What are the significant problems of the log-log transformations? Any papers published about it? Enrico. 2012/4/14 Frank Harrell > I used log-log in my book too until Terry Therneau alerted me to the > significant problems this creates. In the 2nd edition it will use log > S(t). > Frank > > Paul Miller wrote > > > > Hello Drs. Colosimo and Harrell, > > > > Thank you for your replies to my question. From Dr. Colosimo, I was able > > to determine that the SAS results can be replicated by adding the > > option conf.type="log-log" to my code as in : > > > > survobj <- survfit(survfrm, conf.type="log-log", data=Survival) > > > > Originally, it looked like the SAS results could be replicated using > > conf.type="plain". Applying this option to my actual data revealed that > > this was not the case, however. > > > >>From Dr. Harrell, I learned that using conf.type="log-log" may not be > such > a good idea. Interestingly though, I've seen at least one instance where > experts in the R community use this option in their book. The book is about > 10 years old. So maybe opinion about the use of this option has shifted > since then. > > > > Thanks, > > > > Paul > > > > [[alternative HTML version deleted]] > > > > > > __ > > R-help@ mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > - > Frank Harrell > Department of Biostatistics, Vanderbilt University > -- > View this message in context: > http://r.789695.n4.nabble.com/Kaplan-Meier-analysis-95-CI-wider-in-R-than-in-SAS-tp4554559p4557695.html > Sent from the R help mailing list archive at Nabble.com. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
I used log-log in my book too until Terry Therneau alerted me to the significant problems this creates. In the 2nd edition it will use log S(t). Frank Paul Miller wrote > > Hello Drs. Colosimo and Harrell, > > Thank you for your replies to my question. From Dr. Colosimo, I was able > to determine that the SAS results can be replicated by adding the > option conf.type="log-log" to my code as in : > > survobj <- survfit(survfrm, conf.type="log-log", data=Survival) > > Originally, it looked like the SAS results could be replicated using > conf.type="plain". Applying this option to my actual data revealed that > this was not the case, however. > >>From Dr. Harrell, I learned that using conf.type="log-log" may not be such a good idea. Interestingly though, I've seen at least one instance where experts in the R community use this option in their book. The book is about 10 years old. So maybe opinion about the use of this option has shifted since then. > > Thanks, > > Paul > > [[alternative HTML version deleted]] > > > __ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Kaplan-Meier-analysis-95-CI-wider-in-R-than-in-SAS-tp4554559p4557695.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Hello Drs. Colosimo and Harrell, Thank you for your replies to my question. From Dr. Colosimo, I was able to determine that the SAS results can be replicated by adding the option conf.type="log-log" to my code as in : survobj <- survfit(survfrm, conf.type="log-log", data=Survival) Originally, it looked like the SAS results could be replicated using conf.type="plain". Applying this option to my actual data revealed that this was not the case, however. >From Dr. Harrell, I learned that using conf.type="log-log" may not be such a >good idea. Interestingly though, I've seen at least one instance where experts >in the R community use this option in their book. The book is about 10 years >old. So maybe opinion about the use of this option has shifted since then. Thanks, Paul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Make sure you use the log S(t) basis on both systems (and avoid log-log S(t) basis as this results in instability in the front part of the survival curve). Frank Paul Miller wrote > > Hi Enrico, > > Not sure how SAS builds the CI but I can look into it. The SAS > documentation does have a section on computational formulas at: > > http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifetest_a000259.htm > > Although I can't provide my dataset, I can provide the data and code > below. This is the R-equivalent of an analysis from "Common Statistical > Methods for Clinical Research with SAS Examples." > > R produces the follwoing output: > >> print(surv.by.vac) > Call: survfit(formula = Surv(WKS, CENS == 0) ~ VAC, data = hsv) > > records n.max n.start events median 0.95LCL 0.95UCL > VAC=GD2 2525 25 14 35 15 NA > VAC=PBO 2323 23 17 15 12 35 > > SAS has the same 95% CI for VAC=GD2 but has a 95% CI of [10, 27] for > VAC=PBO. This is just like in the analysis I'm doing currently. > > Thanks, > > Paul > > > ### > Chapter 21: The Log-Rank Test > ### > > # > Example 21.1: HSV2 Vaccine with gD2 Vaccine > # > > connection <- textConnection(" > GD2 1 8 12 GD2 3 -12 10 GD2 6 -52 7 > GD2 7 28 10 GD2 8 44 6 GD2 10 14 8 > GD2 12 3 8 GD2 14 -52 9 GD2 15 35 11 > GD2 18 6 13 GD2 20 12 7 GD2 23 -7 13 > GD2 24 -52 9 GD2 26 -52 12 GD2 28 36 13 > GD2 31 -52 8 GD2 33 9 10 GD2 34 -11 16 > GD2 36 -52 6 GD2 39 15 14 GD2 40 13 13 > GD2 42 21 13 GD2 44 -24 16 GD2 46 -52 13 > GD2 48 28 9 PBO 2 15 9 PBO 4 -44 10 > PBO 5 -2 12 PBO 9 8 7 PBO 11 12 7 > PBO 13 -52 7 PBO 16 21 7 PBO 17 19 11 > PBO 19 6 16 PBO 21 10 16 PBO 22 -15 6 > PBO 25 4 15 PBO 27 -9 9 PBO 29 27 10 > PBO 30 1 17 PBO 32 12 8 PBO 35 20 8 > PBO 37 -32 8 PBO 38 15 8 PBO 41 5 14 > PBO 43 35 13 PBO 45 28 9 PBO 47 6 15 > ") > > hsv <- data.frame(scan(connection, list(VAC="", PAT=0, WKS=0, X=0))) > hsv <- transform(hsv, > CENS = ifelse(WKS < 1, 1, 0), > WKS = abs(WKS), > TRT = ifelse(VAC=="GD2", 1, 0)) > > library("survival") > surv.by.vac <- survfit(Surv(WKS,CENS==0)~VAC, data=hsv) > > plot(surv.by.vac, > main = "The Log-Rank Test \n Example 21.1: HSV-Episodes with gD2 > Vaccine", > ylab = "Survival Distribution Function", > xlab = "Survival Time in Weeks", > lty = c(1,2)) > > legend(0.75,0.19, > legend = c("gD2","PBO"), > lty = c(1,2), title = "Treatment") > > summary(surv.by.vac) > print(surv.by.vac) > > > __ > R-help@ mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > - Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/Kaplan-Meier-analysis-95-CI-wider-in-R-than-in-SAS-tp4554559p4555447.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Hi Enrico, Not sure how SAS builds the CI but I can look into it. The SAS documentation does have a section on computational formulas at: http://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifetest_a000259.htm Although I can't provide my dataset, I can provide the data and code below. This is the R-equivalent of an analysis from "Common Statistical Methods for Clinical Research with SAS Examples." R produces the follwoing output: > print(surv.by.vac) Call: survfit(formula = Surv(WKS, CENS == 0) ~ VAC, data = hsv) records n.max n.start events median 0.95LCL 0.95UCL VAC=GD2 2525 25 14 35 15 NA VAC=PBO 2323 23 17 15 12 35 SAS has the same 95% CI for VAC=GD2 but has a 95% CI of [10, 27] for VAC=PBO. This is just like in the analysis I'm doing currently. Thanks, Paul ### Chapter 21: The Log-Rank Test ### # Example 21.1: HSV2 Vaccine with gD2 Vaccine # connection <- textConnection(" GD2 1 8 12 GD2 3 -12 10 GD2 6 -52 7 GD2 7 28 10 GD2 8 44 6 GD2 10 14 8 GD2 12 3 8 GD2 14 -52 9 GD2 15 35 11 GD2 18 6 13 GD2 20 12 7 GD2 23 -7 13 GD2 24 -52 9 GD2 26 -52 12 GD2 28 36 13 GD2 31 -52 8 GD2 33 9 10 GD2 34 -11 16 GD2 36 -52 6 GD2 39 15 14 GD2 40 13 13 GD2 42 21 13 GD2 44 -24 16 GD2 46 -52 13 GD2 48 28 9 PBO 2 15 9 PBO 4 -44 10 PBO 5 -2 12 PBO 9 8 7 PBO 11 12 7 PBO 13 -52 7 PBO 16 21 7 PBO 17 19 11 PBO 19 6 16 PBO 21 10 16 PBO 22 -15 6 PBO 25 4 15 PBO 27 -9 9 PBO 29 27 10 PBO 30 1 17 PBO 32 12 8 PBO 35 20 8 PBO 37 -32 8 PBO 38 15 8 PBO 41 5 14 PBO 43 35 13 PBO 45 28 9 PBO 47 6 15 ") hsv <- data.frame(scan(connection, list(VAC="", PAT=0, WKS=0, X=0))) hsv <- transform(hsv, CENS = ifelse(WKS < 1, 1, 0), WKS = abs(WKS), TRT = ifelse(VAC=="GD2", 1, 0)) library("survival") surv.by.vac <- survfit(Surv(WKS,CENS==0)~VAC, data=hsv) plot(surv.by.vac, main = "The Log-Rank Test \n Example 21.1: HSV-Episodes with gD2 Vaccine", ylab = "Survival Distribution Function", xlab = "Survival Time in Weeks", lty = c(1,2)) legend(0.75,0.19, legend = c("gD2","PBO"), lty = c(1,2), title = "Treatment") summary(surv.by.vac) print(surv.by.vac) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Kaplan Meier analysis: 95% CI wider in R than in SAS
Hello All, Am replicating in R an analysis I did earlier using SAS. See this as a test of whether I'm ready to start using R in my day-to-day work. Just finished replicating a Kaplan Meier analysis. Everything seems to work out fine except for one thing. The 95% CI around my estimate for the median is substantially larger in R than in SAS. For example, in SAS I have a median of 3.29 with a 95% CI of [1.15, 5.29]. In R, I get a median of 3.29 with a 95% CI of [1.35, 13.35]. Can anyone tell me why I get this difference? My R code looks like: survfrm <- Surv(progression_months_landmark_14,progression==1) ~ pr_rg_landmark_14 survobj <- survfit(survfrm, data=Survival) survlrk <- survdiff(survfrm, data=Survival) summary(survobj) print(survobj) print(survlrk) My SAS code looks like: proc lifetest data=survival; strata pr_rg_landmark_14; time progression_months_landmark_14 * progression(0); run; Thought maybe the difference could have something to do with the strata statement in the SAS code not being translated properly into R. Tried changing my R code to make pr_rg_landmark_14 a strata but this didn't seem to change anything. Except that I no longer got a log rank test. Thanks, Paul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.