Hi R experts,
I am trying to do a prognostic model validation study, using cancer survival data. There are 2 data sets - 1500 cases used to develop a nomogram, and another of 800 cases used as an independent validation cohort. I have validated the nomogram in the original data (easy with the Design tools), and then want to show that it also has good results with the independent data using 60 month survival. I would also like to show that the nomogram is significantly different to an existing model based on 60 month survival data generated by it (eg by McNemar's test). Hence, somewhat shortened: #using R 2.01 on Windows library(Hmisc) library(Design) data1 #dataframe with predictor variables A and B, cens and time columns (months) ddist1 <- datadist(data1) options(datadist='ddist1') s1 <- Surv(data1$time, data1$cens) cph.nomo <- cph(s1 ~ A+B, surv=T, x=T, y=T, time.inc=60) survcph <- Survival(cph.nomo, x=T, y=T, time.inc=60, surv=T) surv5 <- function(lp) survcph(60, lp) nomogram(cph.nomo, lp=T, conf.int=F, fun=list(surv5, surv7), funlabel=c("5 yr DFS")) # now have a useful nomogram model, with good discrimination and #calibration when checked with validate and calibrate (not shown) #....move on to validation cohort of n=800 Data2 #Validation data with same predictor variables A, B, cens, time # do I need to put data2 into datadist?? s2 <- Surv(data2$time, data2$cens) #able to derive 60 month estimates of survival using data2.est5 <- survest(cph.nomo, expand.grid(A=data2$A, B=data2$B), times=c(60), conf.int=0) rcorr.cens(data2.est5$surv, s2) # tests discrimination of the model #against the validation data observed censored data # I cant find a way to use calibrate in this setting though?? # Also, if I have the 5 year estimates for 2 different models, I can # use rcorr.cens to show discrimination, but which values are # suitable for a test of difference (eg with McNemars)? # I have tried predict / newdata function a number of ways but it # typically returns an error relating to unequal vector lengths What I cant work out is where to go now to derive a calibration curve of the predicted 5 year result (val.data5) and the observed (s2). Or can I do it another way? For example, could I merge the 2 data frames and use lines1:1500 to build the model and the last 800 lines to validate? Obviously I am a novice, and sure to be missing something simple. I have spent countless hours pouring over Prof Harrell's text (which is great but doesn't have a specific example of this) and Design Help plus the R news archive with no success, so any help is very much appreciated. Scott Williams MD Peter MacCallum Cancer Centre Melbourne Australia [[alternative HTML version deleted]] ______________________________________________ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html