Dear list:
I am attempting to use what I thought would be a pretty straightforward
practical application of Cox regression. I figure users of the survival package
must have come across this problem before, so I would like to ask you how you
dealt with it. I have set up an illustrative example and included it at the end
of this post.
I took a sample of 100 data points from each of two populations ('a' and 'b')
known to have Gompertz distributions that differ in both their lambda (initial
rate, which is higher in population 'a') and gamma (acceleration rate, which is
higher in population 'b') parameters. Each data point represents the age at
which an experimental animal died.
First question: given the way the parameters differ between the Gompertz
distributions, the mortality rate will be higher for 'a' younger ages, higher
for 'b' at older ages, and the assumption of the Cox Proportional Hazards model
is violated a priori, isn't it? cox.zph confirms this. Yet I found plenty of
Gompertz parameter values that differ, and lead to differences in survival
times detectable by coxph, yet pass the cox.zph test. Should I assume that
cox.zph is insufficiently sensitive and take measures to account for
non-proportionality of hazards anytime I know I'm looking at longevity data
(which is believed to have a Gompertz distribution for mammals dying from 'old
age')?
Second question: supposedly the way to handle such data with coxph is to either
stratify it by age or include an interaction term with age in the formula. The
first option is not optimal, because the hazard for a Gompertz distribution
changes continuously and there are no natural intervals within which the hazard
ratio is relatively constant. Therefore, I would like to use the second option,
but including the age term in the formula produces an error that says "Error in
fitter(X, Y, strats, offset, init, control, weights = weights, : NA/NaN/Inf
in foreign function call (arg 6)". My question is, what would be the correct
way to include age in the formula?
If you have read this far, thank you kindly for your time. Below is code for
reproducing the example I explained above.
# create the data
a<-c(1048, 773, 1072, 753, 888, 1038, 852, 950, 1335, 1227, 904, 634, 980,
1075, 1308, 787, 1079, 1151, 677, 1110, 363, 936, 774, 644, 1080, 1055, 1119,
975, 941, 1148, 1014, 541, 1297, 584, 847, 1136, 793, 1171, 985, 934, 852, 550,
944, 1165, 1190, 99, 1227, 1052, 1106, 888, 884, 555, 862, 1241, 841, 987,
1162, 1028, 1009, 1102, 1082, 1009, 1200, 902, 1215, 1121, 1177, 882, 766,
1366, 1190, 755, 958, 38, 925, 1131, 1031, 639, 704, 1097, 820, 570, 1029,
1205, 1284, 1139, 522, 1197, 1268, 1376, 842, 1310, 959, 1160, 824, 652, 1121,
1262, 1006, 1021);
b<-c(797, 897, 993, 1174, 998, 1117, 583, 698, 1125, 796, 1055, 953, 847, 892,
875, 1108, 1090, 1101, 1104, 991, 882, 1036, 889, 843, 1129, 1018, 936, 1035,
906, 1192, 1028, 1109, 790, 897, 544, 978, 1108, 1004, 730, 1093, 845, 1014,
1040, 1080, 921, 958, 971, 950, 1026, 882, 1055, 703, 901, 991, 863, 1043, 999,
784, 908, 987, 1040, 760, 887, 1028, 808, 1077, 812, 843, 1002, 639, 905, 808,
850, 1112, 736, 851, 1008, 990, 516, 1015, 942, 993, 1127, 959, 963, 1069, 940,
815, 926, 1005, 983, 1093, 898, 901, 1132, 1011, 808, 905, 1129, 840);
# format the data for coxph
age<-c(a,b); group<-as.factor(c('a','b'),c(100,100)));
# run coxph; for simplicity, the data is uncensored
ab.cox<-coxph(Surv(age)~group);
# the results indicate a singificant difference between the groups
ab.cox;
# however, it appears that the hazards are not proportional and the result is
not interpretable
cox.zph(ab.cox);
# including age in an interaction term leads to an error
ab.int.cox<-coxph(Surv(age)~group+group:age);
# the following is like above; 'c' is from a population differs from 'a' in
both parameters
# and coxph indicates a highly significant difference, but this time the data
seems to not
# violate the proportional hazards assumption
c<-c(526, 858, 36, 814, 402, 902, 653, 779, 776, 971, 789, 824, 680, 1017, 957,
607, 1142, 1144, 888, 861, 868, 1052, 825, 654, 443, 1023, 727, 612, 1017, 888,
811, 820, 780, 846, 844, 272, 1083, 855, 903, 783, 1036, 1026, 462, 1007, 777,
1124, 1077, 1041, 900, 963, 332, 740, 1012, 1017, 705, 864, 1053, 643, 1195,
795, 1040, 792, 714, 971, 832, 558, 1107, 1038, 1149, 448, 605, 921, 772, 994,
1034, 949, 788, 1013, 788, 1038, 956, 850, 1045, 747, 967, 814, 661, 657, 730,
966, 676, 344, 743, 643, 1004, 766, 942, 1040, 1067, 409);
age2<-c(a,c); group2<-as.factor(rep(c('a','c'),c(100,100)));
ac.cox<-coxph(Surv(age2)~group2);
ac.cox;
cox.zph(ac.cox);
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.