Thank you, Dr. Therneau, that was very helpful. Best regards,
Omar. On Mon, Oct 8, 2012 at 9:58 AM, Terry Therneau <thern...@mayo.edu> wrote: > >> I am interested in producing the expected number of events, in a >> recurring events setting. I am using the Andersen-Gill model, as fit >> by the function "coxph" in the package "survival." >> >> I need to produce expected numbers of events for a cohort, >> cumulatively, at several fixed times. My ultimate goal is: To fit an >> AG model to a reference sample, then use that fitted model to generate >> expected numbers of events for a new cohort; then, comparing the >> expected vs. the observed numbers of events would give us some idea of >> whether the new cohort differs from the reference one. >> >>> From my reading of the documentation and the text by Therneau and >> >> Grambsch, it seems that the function "survexp" is what I need. But >> using it I am not able to obtain expected numbers of events that match >> reasonably well the observed numbers *even for the same reference >> population.* So, I think I am misunderstanding something quite badly. >> > > You've hit a common confusion. Observed versus expected events > computations are done on a cumulative hazard scale H, not the surivival > scale S; S = exp(-H). Relating this back to simple Poisson models H(t) > would be the expected number of events by time t and S(t) the probability of > "no events before time t". G. Berry (Biometrics 1983) has a classic ane > readable article on this (especially if you ignore the proofs). > > Using your example: > >> cphfit <- >> coxph(Surv(start,stop,event)~rx+number+size+cluster(id),data=bladder2) >> zz <- predict(cphfit, type='expected') >> c(sum(zz), sum(bladder2$event)) > [1] 112 112 > >> tdata <- bladder2[1:10] #new data set (lazy way) >> predict(cphfit, type='expected', newdata=tdata) > [1] 0.0324089 0.3226540 0.4213402 1.0560768 0.6702130 0.2163531 0.6490665 > [8] 0.8864808 0.2932915 0.5190647 > > > You can also do this using survexp and the cohort=FALSE argument, which > would return S(t) for each subject and we would then use -log(result) to get > H. This is how it was done when I wrote the book, but the newer predict > function is easier. > > Terry Therneau > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.