I think this is a case of (a - x) + x != a in floating arithmetic. When updating the risk set, you subtract sums of covariates at the end of a time interval, then add them back at the beginning of the next interval. Something like that, anyway. As in
> x <- rnorm(1000) > sum(c(x,-x)) [1] -1.387779e-17 Or (worse, since sum() is smart and doesn't operate sequentially) > tt <- 0 > for (i in x) tt <- tt+i > for (i in x) tt <- tt-i > tt [1] 2.553513e-14 -pd > On 23 Aug 2019, at 11:12 , Göran Broström <goran.brost...@umu.se> wrote: > > > > Den 2019-08-22 kl. 21:48, skrev Göran Broström: >> On 2019-08-18 19:10, Ferenci Tamas wrote: >>> Dear All, >>> >>> Consider the following simple example: >>> >>> library( survival ) >>> data( veteran ) >>> >>> coef( coxph(Surv(time, status) ~ trt + prior + karno, data = veteran) ) >>> trt prior karno >>> 0.180197194 -0.005550919 -0.033771018 >>> >>> Note that we have neither time-dependent covariates, nor time-varying >>> coefficients, so the results should be the same if we change to >>> counting process format, no matter where we cut the times. >>> >>> That's true if we cut at event times: >>> >>> veteran2 <- survSplit( Surv(time, status) ~ trt + prior + karno, >>> data = veteran, cut = unique( veteran$time ) ) >>> >>> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = >>> veteran2 ) ) >>> trt prior karno >>> 0.180197194 -0.005550919 -0.033771018 >>> >>> But quite interestingly not true, if we cut at every day: >>> >>> veteran3 <- survSplit( Surv(time, status) ~ trt + prior + karno, >>> data = veteran, cut = 1:max(veteran$time) ) >>> >>> coef( coxph(Surv(tstart,time, status) ~ trt + prior + karno, data = >>> veteran3 ) ) >>> trt prior karno >>> 0.180197215 -0.005550913 -0.033771016 >>> >>> The difference is not large, but definitely more than just a rounding >>> error, or something like that. >>> >>> What's going on? How can the results get wrong, especially by >>> including more cutpoints? >> All results are wrong, but they are useful (paraphrasing George EP Box). > > That said, it is a little surprising: The generated risk sets are (should be) > identical in all cases, and one would expect rounding errors to be the same. > But data get stored differently, and ... who knows? > > I tried your examples on my computer and got exactly the same results as you. > Which surprised me. > > G, > >> Göran >>> >>> Thank you in advance, >>> Tamas >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.