On Sep 4, 2020, at 11:45 AM, POLWART, Calum (SOUTH TEES HOSPITALS NHS
FOUNDATION TRUST) via R-help <[email protected]> wrote:
>
> Using survfit I can get the '1 year' Survival from this dataset which holds
> survival in days:
>
> require (survival)
> survfit( Surv(time, status) ~sex, data=colon)
> summary (fit, 365)
>
> My current real world data I'm calculating time using lubridate to calculate
> time and since it made the axis easy I just told it to do and so my "time"
> appears to be a float in months.
>
> time <- time_length(interval(startDate, endDate), "months")
>
> Is there a "right" approach to this (as in a convention). If I use 12months
> as a year and describe it in the write up as 12, 24 and 36 month survival
> rather than 1, 2 and 3 year presumably that is OK..
>
> I've been asked to report 30, 60 & 90day. Then 6month, 1, 2 and 3 year
> survival.
>
> Should I calculate time 3 times, (interval day, month and year) and run the
> survival on each to get the requested outputs or would people just provide
> something close.
>
> Should I run a campaign to decimilise time?
Hi,
The answer may depend upon whether you are presenting the results in a tabular
fashion, in the body of a manuscript, or in a figure. Also, what may be the
community conventions in your domain.
If you want to get the irregular time points out in a single output, you can
use the times argument to do this, remembering that the default time intervals
are in days for this dataset:
> summary(fit, times = c(30, 60, 90, 180, 365.25, 2 * 365.25, 3 * 365.25))
Call: survfit(formula = Surv(time, status) ~ sex, data = colon)
sex=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
30 887 2 0.998 0.00159 0.995 1.000
60 880 6 0.991 0.00317 0.985 0.997
90 869 11 0.979 0.00485 0.969 0.988
180 827 42 0.931 0.00849 0.915 0.948
365 731 94 0.825 0.01274 0.801 0.851
730 595 135 0.673 0.01576 0.643 0.705
1096 536 57 0.608 0.01641 0.577 0.641
sex=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
30 962 5 0.995 0.00230 0.990 0.999
60 955 6 0.989 0.00341 0.982 0.995
90 947 8 0.980 0.00446 0.972 0.989
180 906 41 0.938 0.00776 0.923 0.953
365 819 85 0.850 0.01150 0.828 0.873
730 679 133 0.711 0.01462 0.683 0.740
1096 592 84 0.623 0.01566 0.593 0.654
Now, the time output there is arguably a bit cumbersome to read...but, at least
you get the relevant values in a single output. You can transform those values
as you may require.
Another option is to use the scale argument, but I just noted that, unless I am
missing something, I think that there may be a lingering buglet in the code for
summary.survfit(), and I am adding Terry Therneau here as a cc:, if that is
correct. The behavior of the interaction between the times and scale arguments
changed in 2009 after an exchange I had with Thomas Lumley:
https://stat.ethz.ch/pipermail/r-devel/2009-April/052901.html
and it is not clear to me if the current behavior is or is not intended after
all this time. Albeit, it may be the defacto behavior at this point in either
case, given some volume of code written over the years that may depend upon
this behavior.
Thus, this may be better for you, using the current behavior:
> summary(fit, scale = 30.44, times = c(1, 2, 3, 6, 12, 24, 36) * 30.44)
Call: survfit(formula = Surv(time, status) ~ sex, data = colon)
sex=0
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 887 2 0.998 0.00159 0.995 1.000
2 880 6 0.991 0.00317 0.985 0.997
3 868 12 0.977 0.00498 0.968 0.987
6 826 42 0.930 0.00855 0.914 0.947
12 731 93 0.825 0.01274 0.801 0.851
24 595 135 0.673 0.01576 0.643 0.705
36 536 57 0.608 0.01641 0.577 0.641
sex=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
1 962 5 0.995 0.00230 0.990 0.999
2 955 6 0.989 0.00341 0.982 0.995
3 946 9 0.979 0.00458 0.970 0.988
6 906 40 0.938 0.00776 0.923 0.953
12 819 85 0.850 0.01150 0.828 0.873
24 679 133 0.711 0.01462 0.683 0.740
36 592 84 0.623 0.01566 0.593 0.654
where the times values are now in months over the range of values, instead of
days.
I don't use the lubridate package, so there may be other options for you there,
but the above will work, if your underlying time intervals in the source data
frame for the model are still in days as a unit of measurement.
Using the base graphics functions, albeit perhaps you are using ggplot or
similar, you can plot the above model with axis markings at the irregular time
intervals, using something like the following:
plot(fit, xaxt = "n", las = 1, xlim = c(0, 36 * 30.44))
axis(1, at = c(1, 2, 3, 6, 12, 24, 36) * 30.44, labels = c(1, 2, 3, 6, 12, 24,
36), cex.axis = 0.65)
This essentially truncates the x axis to 36 months, since the intervals in the
example colon dataset go to about 9 years or so, and does not label the x axis.
Bearing in mind that the underlying x axis unit is in days, the axis() function
then places labels at the irregular intervals. You could then annotate the plot
further as you may desire.
Regards,
Marc Schwartz
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.