Re: [R] Schoenfeld residuals for a null model coxph
begin included message --- I have a coxph model like coxph(Surv(start, stop, censor) ~ x + y, mydata) I would like to calculate the Schoenfeld residuals for the null, i.e the same model where the beta hat vector (in practical terms, the coeff vector spat out by summary()) is constrained to be all 0s --all lese stays the same. - end inclusion --- You can get the fit for any coefficient you want by setting the initial values and asking for 0 iterations. fit0 <- coxph(Surv(start, stop, censor) ~ x + y, mydata, init=c(0,0), iter=0) resid(fit0, type='schoenfeld') Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Schoenfeld residuals for a null model coxph
Hi, I have a coxph model like coxph(Surv(start, stop, censor) ~ x + y, mydata) I would like to calculate the Schoenfeld residuals for the null, i.e the same model where the beta hat vector (in practical terms, the coeff vector spat out by summary()) is constrained to be all 0s --all lese stays the same. I could calculate it by hand, but I was wondering if there is a way of doing it with resid() -- resid(my.coxph.mod, type = 'schoenfeld') works wery well for the true Schoenfeld residuals, but I haven't figured out how to use it constraining the beta hat vector to be all 0s. BW F -- Federico C. F. Calboli Neuroepidemiology and Ageing Research Imperial College, St. Mary's Campus Norfolk Place, London W2 1PG Tel +44 (0)20 75941602 Fax +44 (0)20 75943193 f.calboli [.a.t] imperial.ac.uk f.calboli [.a.t] gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Schoenfeld Residuals with tied data
-- begin included message Thank you for the reply. I really appreciate it. I calculated the Scoenfeld residual per event and my results are the following: finage race 17 -0.33942334 -2.0722187270.29024804 20 0.394600944 5.303968774 0.517689472 25 0.488184603 -1.438140647-0.359535162 25 -0.511815397-1.438140647-0.359535162 However, the results from R are shown as below: fin agerace 17 -0.3394233 -2.072219 0.290248 20 0.3946009 5.303969 0.5176895 25 0.4724112 -1.615875 -0.4039688 25 -0.5275888 -1.615875 -0.4039688 end inclusion --- Bessy, By default the coxph function uses the Efron approximation for ties (more accurate than the Breslow approx). You are computing the Schoenfeld residuals using the betas from coxph (Efron), and then applying a Breslow formula. I can reproduce your results by forcing the same computation in R: > fit1 <- coxph(Surv(week, arrest) ~ fin + age + race, bessdata) > fit2 <- coxph(Surv(week, arrest) ~ fin + age + race, bessdata, init=coef(fit1), iter=0, method='breslow') > resid(fit2, 'schoen') fin age race 17 -0.3394233 -2.072219 0.2902480 20 0.3946009 5.303969 0.5176895 25 0.4881846 -1.438141 -0.3595352 25 -0.5118154 -1.438141 -0.3595352 The Efron approximation is simple for computation of beta, but a bit more subtle when doing the residuals from the fit (but still not hard). You can find a detailed worked example in E.1.2 of the book by Therneau and Grambsch. If you had run the phreg procedure in SAS you would not have had this problem: it contains exactly the same incorrect computation. (Appendix E of the book contains a set of very small data sets where the correct answers can be worked out in closed form. SAS gets all of the Breslow approx computations correct and about 1/2 of the Efron ones. R passes all the tests.) Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Schoenfeld Residuals with tied data
Thank you for the reply. I really appreciate it. I calculated the Scoenfeld residual per event and my results are the following: finage race 17 -0.33942334 -2.0722187270.29024804 20 0.394600944 5.303968774 0.517689472 25 0.488184603 -1.438140647-0.359535162 25 -0.511815397-1.438140647-0.359535162 However, the results from R are shown as below: fin agerace 17 -0.3394233 -2.072219 0.290248 20 0.3946009 5.303969 0.5176895 25 0.4724112 -1.615875 -0.4039688 25 -0.5275888 -1.615875 -0.4039688 These two results are both calculated based on the same dataset, which is shown as below (the formula was attached with the original message): weekarrest fin age race 17 1 0 18 1 20 1 1 27 1 25 1 1 19 0 25 1 0 19 0 52 0 1 23 1 52 0 0 19 0 At time 17 and 20, there is no ties, my results are almost the same as R's. However, the dataset contains two failures at time 25, I had got different results from R's. If I summed the Schoenfeld residuls at a given event time, for example, 25 in my dataset, I cannot get the common results. Does R modified the formula when meeting ties? Thank you so much. Bessy Terry Therneau wrote: > > > Formally, the Schoenfeld residuals are defined as one residual per event > time. > > I found that when there are tied events, however, that plots of the > residuals > could be hard to visually interpret: sometimes a residual was large > because of a > lack of fit at that point, sometimes because several death happened > concurrently > at that point. > > Since the Shoenfeld residual is a sum over the events at each time point, > coxph > returns one residual per EVENT, rather than one per event time. The plots > work > much better (my subjective opinion). If you sum the returned residuals at > a > given event time, you will get the common result. > > Terry Therneau > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/Schoenfeld-Residuals-with-tied-data-tp2403p24050423.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Schoenfeld Residuals with tied data
Formally, the Schoenfeld residuals are defined as one residual per event time. I found that when there are tied events, however, that plots of the residuals could be hard to visually interpret: sometimes a residual was large because of a lack of fit at that point, sometimes because several death happened concurrently at that point. Since the Shoenfeld residual is a sum over the events at each time point, coxph returns one residual per EVENT, rather than one per event time. The plots work much better (my subjective opinion). If you sum the returned residuals at a given event time, you will get the common result. Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Schoenfeld Residuals with tied data
Dear all, I am struggling with calculation of Schoenfeld residuals of my Cox Ph models. Based on the formula as attached, I calculated the Schoenfeld residuals for both non tied and tied data, respectively. And then I validated my results with R using the same data sets. However, I found that my results for non-tied data was ok but the results for tied data were different from R's. How does R calculate the Schoenfeld Residuals for tied data? Could someone give me some hints, please? Thank you so much. Bessy http://www.nabble.com/file/p2403/Schoenfeld%2BResiduals.doc Schoenfeld+Residuals.doc -- View this message in context: http://www.nabble.com/Schoenfeld-Residuals-with-tied-data-tp2403p2403.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Schoenfeld residuals
Hi, Thank you for your comments and apologies for the delay in replying. rem.Rcens =1 for the censored variables. The problem arises because I am not strictly looking at time to death. Instead I am looking at time to 12-month remission in epilepsy. Therefore a lot of people have the same event i.e. they successfully achieve 12-month remission from day 1 of the treatment. I think I shall avoid the problem by 'excluding' patients with immediate 12-month remission i.e. I will look at patients with immediate success in a separate analysis to patients with delayed success. Thanks for your help, Laura 2009/4/6 Terry Therneau : > Laura Bonnett was kind enough to send me a copy of the data that caused the > plotting error, since it was an error I had not seen before. > > 1. The latest version of survival gives a nicer error message: > >> fit <- coxph(Surv(rem.Remtime, rem.Rcens) ~ all.sex, nearma) >> cfit <- cox.zph(fit) >> plot(cfit) > Error in plot.cox.zph(cfit) : > Spline fit is singular, try a smaller degrees of freedom > > 2. What's the problem? > There are 1085 events in the data set (rem.Rcens==1), and of these 502 are > tied events on exactly day 365. The plot.cox.zph function tries to fit a > smoothing spline to the data to help the eye; the fit gives weight 1 to each > death and having this high a proportion of ties creates problems for the > underlying regression. > > 3. >> plot(cfit, df=2) > Warning messages: > 1: In approx(xx, xtime, seq(min(xx), max(xx), length = 17)[2 * (1:8)]) : > collapsing to unique 'x' values > 2: In approx(xtime, xx, temp) : collapsing to unique 'x' values > > These warning messages are ignorable. I'll work on making them go away. > > > 4. A shot in the dark -- is perchance the variable rem.Rcens=1 a marker of a > censored observation, and the events are 0? (A whole lot of events at 1 year > is > suspicious, but half censored at one year is believable.) Then the proper > coxph > code is > >> fit2 <- coxph(Surv(rem.Remtime, rem.Rcens==0) ~ all.sex, nearma) > > Terry Therneau > > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Schoenfeld residuals
Laura Bonnett was kind enough to send me a copy of the data that caused the plotting error, since it was an error I had not seen before. 1. The latest version of survival gives a nicer error message: > fit <- coxph(Surv(rem.Remtime, rem.Rcens) ~ all.sex, nearma) > cfit <- cox.zph(fit) > plot(cfit) Error in plot.cox.zph(cfit) : Spline fit is singular, try a smaller degrees of freedom 2. What's the problem? There are 1085 events in the data set (rem.Rcens==1), and of these 502 are tied events on exactly day 365. The plot.cox.zph function tries to fit a smoothing spline to the data to help the eye; the fit gives weight 1 to each death and having this high a proportion of ties creates problems for the underlying regression. 3. > plot(cfit, df=2) Warning messages: 1: In approx(xx, xtime, seq(min(xx), max(xx), length = 17)[2 * (1:8)]) : collapsing to unique 'x' values 2: In approx(xtime, xx, temp) : collapsing to unique 'x' values These warning messages are ignorable. I'll work on making them go away. 4. A shot in the dark -- is perchance the variable rem.Rcens=1 a marker of a censored observation, and the events are 0? (A whole lot of events at 1 year is suspicious, but half censored at one year is believable.) Then the proper coxph code is > fit2 <- coxph(Surv(rem.Remtime, rem.Rcens==0) ~ all.sex, nearma) Terry Therneau __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Schoenfeld Residuals
Thank you for your comments. I have about 200 out of 2000 tied data points which makes the situation more complicated! I'll have at look at the book section you referred to. With regards to making the ylim finite, I'm not sure how I can go about that given that I don't understand why it isn't already! Thank you for your help, Laura 2009/4/3 David Winsemius : > I am not sure that ties are the only reason. If I create a few ties in the > ovarian dataset that Therneau and Lumley provide, all I get are some > warnings: >> ovarian[4:5, 1] <- mean(ovarian[4:5, 1]) >> ovarian[6:8, 1] <- mean(ovarian[6:8, 1]) >> fit <- coxph( Surv(futime, fustat) ~ age + rx, ovarian) >> temp<- cox.zph(fit) > >> plot(temp) > Warning messages: > 1: In approx(xx, xtime, seq(min(xx), max(xx), length.out = 17)[2 * : > collapsing to unique 'x' values > 2: In approx(xtime, xx, temp) : collapsing to unique 'x' values > > The error message you get is requesting a finite ylim. Have you considered > acceding with that request? > > Alternative: Assuming the number of tied survival times is modest, have you > tried jitter-ing the rem.Remtime variable a few times to see it the results > are stable? > > If the number of ties is large, then you need to review Thernaeu & Gramsch > section 3.3 > > -- > David Winsemius > > On Apr 3, 2009, at 7:57 AM, Laura Bonnett wrote: > >> Dear All, >> >> Sorry to bother you again. >> >> I have a model: >> coxfita=coxph(Surv(rem.Remtime/365,rem.Rcens)~all.sex,data=nearma) >> and I'm trying to do a plot of Schoenfeld residuals using the code: >> plot(cox.zph(coxfita)) >> abline(h=0,lty=3) >> >> The error message I get is: >> Error in plot.window(...) : need finite 'ylim' values >> In addition: Warning messages: >> 1: In sqrt(x$var[i, i] * seval) : NaNs produced >> 2: In min(x) : no non-missing arguments to min; returning Inf >> 3: In max(x) : no non-missing arguments to max; returning -Inf >> >> My data (nearma) has a lot of rem.Remtime entries which are equal i.e >> large amounts of tied data. If I remove the entries where this is the >> case from the dataset I get the results I want! >> >> Please can someone explain why removing paients with tied remission >> time has such an effect on the code and also how to remedy the problem >> without removing patients? >> >> Thank you very much, >> >> Laura. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Schoenfeld Residuals
I am not sure that ties are the only reason. If I create a few ties in the ovarian dataset that Therneau and Lumley provide, all I get are some warnings: > ovarian[4:5, 1] <- mean(ovarian[4:5, 1]) > ovarian[6:8, 1] <- mean(ovarian[6:8, 1]) > fit <- coxph( Surv(futime, fustat) ~ age + rx, ovarian) > temp<- cox.zph(fit) > plot(temp) Warning messages: 1: In approx(xx, xtime, seq(min(xx), max(xx), length.out = 17)[2 * : collapsing to unique 'x' values 2: In approx(xtime, xx, temp) : collapsing to unique 'x' values The error message you get is requesting a finite ylim. Have you considered acceding with that request? Alternative: Assuming the number of tied survival times is modest, have you tried jitter-ing the rem.Remtime variable a few times to see it the results are stable? If the number of ties is large, then you need to review Thernaeu & Gramsch section 3.3 -- David Winsemius On Apr 3, 2009, at 7:57 AM, Laura Bonnett wrote: Dear All, Sorry to bother you again. I have a model: coxfita=coxph(Surv(rem.Remtime/365,rem.Rcens)~all.sex,data=nearma) and I'm trying to do a plot of Schoenfeld residuals using the code: plot(cox.zph(coxfita)) abline(h=0,lty=3) The error message I get is: Error in plot.window(...) : need finite 'ylim' values In addition: Warning messages: 1: In sqrt(x$var[i, i] * seval) : NaNs produced 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf My data (nearma) has a lot of rem.Remtime entries which are equal i.e large amounts of tied data. If I remove the entries where this is the case from the dataset I get the results I want! Please can someone explain why removing paients with tied remission time has such an effect on the code and also how to remedy the problem without removing patients? Thank you very much, Laura. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Schoenfeld Residuals
Dear All, Sorry to bother you again. I have a model: coxfita=coxph(Surv(rem.Remtime/365,rem.Rcens)~all.sex,data=nearma) and I'm trying to do a plot of Schoenfeld residuals using the code: plot(cox.zph(coxfita)) abline(h=0,lty=3) The error message I get is: Error in plot.window(...) : need finite 'ylim' values In addition: Warning messages: 1: In sqrt(x$var[i, i] * seval) : NaNs produced 2: In min(x) : no non-missing arguments to min; returning Inf 3: In max(x) : no non-missing arguments to max; returning -Inf My data (nearma) has a lot of rem.Remtime entries which are equal i.e large amounts of tied data. If I remove the entries where this is the case from the dataset I get the results I want! Please can someone explain why removing paients with tied remission time has such an effect on the code and also how to remedy the problem without removing patients? Thank you very much, Laura. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.