I welcome a trivial self-contained set of code, not using frailties, that generates its own data and which fails. Then I'll debug. Frank
David Winsemius wrote: > > On Aug 21, 2011, at 7:03 AM, Salvo Mac wrote: > >> I've attached a sample of the data sets, this is my full code. >> >> #R 2.13 >> #library(survival) >> #library(Hmisc) >> #library(splines) >> library(rms) >> train<-as.data.frame( train<-read.csv("G:\\train.txt", header=T, >> sep="\t")) >> test<-as.data.frame( test<-read.table("G:\\test.txt", header=T, >> sep="\t")) >> f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train) >> val.surv(f.1, newdata=test, u=10) >> >> >> #plot(calibrate(f.1, u=30, B=20)) > > Salvo; > > I'm not sure where the error is coming from (although I will share my > speculation below). I modified your code somewhat. I was under the > impression that read.csv had sep="," hardcoded and the read.csv( ... > seo="\t") would throw and error. The as.data.fame around either of the > read.* statements is completely unnecessary: > > Input code: > train<-read.table("~/train.txt", header=T, sep="\t") > test<-read.table("~/test.txt", header=T, sep="\t") > > I thought the newdata=test argument should succeed, but it does throw > the error that you report: > > f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train) > val.surv(f.1, newdata=test, u=10) > Error in val.surv(f.1, newdata = test, u = 10) : > dims [product 210] do not match the length of object [314] > In addition: Warning message: > In est.surv + S[, 1] : > longer object length is not a multiple of shorter object length > > I tried sampling from test with replacement to generate a dataframe of > equal extent and with that object I do not get the same errors: > > > extend <- test[sample(1:nrow(test), 314, replace=TRUE), ] > > f.1<-cph(Surv(time,event)~age, x=T, y=T,surv=T, data=train) > > val.surv(f.1, newdata=extend, u=10) > > Validation of Predicted Survival at Time= 10 n= 314 , events= 64 > > hare fit: > > dim A/D loglik AIC penalty > min max > 1 Add -485.24 976.22 271.33 Inf > 2 Del -349.57 710.64 5.67 271.33 > 3 Del -346.74 710.72 0.96 5.67 > 4 Del -346.26 715.51 0.01 0.96 > 5 Add -346.25 721.25 0.00 0.01 > > the present optimal number of dimensions is 2. > penalty(AIC) was 5.75, the default (BIC), would have been 5.75. > > dim1 dim2 beta SE Wald > Constant -10 0.55 -18.14 > Time 43 0.15 0.015 10.30 > > Function used to transform predictions: > function (p) log(-log(p)) > > Mean absolute error in predicted probabilities: 0.0331 > 0.9 Quantile of absolute errors : 0.0613 > > I have also tried looking at the code and adding > options(error=utils::recover) to see if I can identify the point where > the length mismatch is being generated, (but I am NOT an ace > debugger). I can see that est.surv is created. I can also get the > predict(fit, newdata, type = "lp") call to run with train and give > sensible numbers. You did not create (or at least say so) a datadist. > I tried that when I saw that "lim" was dependent on datadist limits. > > ddT <- datadist(train); options(datadist="ddT") > > Seeing the usehare was set to TRUE; I submitted the code section that > was intended for that situation to a browser session and the the first > line throws the error that I see at the console: > > Enter a frame number, or 0 to exit > > 1: val.surv(f.1, newdata = test, u = 10) > > Selection: 1 > Called from: top level > Browse[1]> i <- !is.na(est.surv + S[, 1] + S[, 2]) > Error during wrapup: dims [product 210] do not match the length of > object [314] > > As I read this that line is supposed to create an index of valid rows > in newdata and the fitted values but is failing in the situation where > the nrows of the newdata does not match that of the fit. > > At this point one option might be trying to generate a structure that > matches the val.surv output by going through the code and building it > up bit by bit: > > w <- structure(list(harefit = f, p = est.surv, actual = actual, > pseq = pseq, actualseq = actualseq, u = u, fun = fun, > n = nrow(S), d = sum(S[, 2]), units = units), class = > "val.survh") > return(w) > > But a much better option would be to report the error to Frank > Harrell. I'm copying him since I think your .txt files probably > reached the list as well as my mailbox. > > -- > David. > >> >> >> >> >> >> ________________________________ >> From: David Winsemius <dwinsem...@comcast.net> >> To: Salvo Mac <salvo_...@yahoo.com> >> Cc: "r-help@R-project.org" <r-help@r-project.org> >> Sent: Sunday, August 21, 2011 5:29 AM >> Subject: Re: [R] val.surv >> >> >> On Aug 20, 2011, at 10:25 PM, Salvo Mac wrote: >> >>> The test and train are like split data sets, contain similar >>> variables but from different countries so the two sets are somehow >>> independent. And yes it is a data frame. >> >> What is a data.frame? test and train may be dataframes, but test[, >> "age"] is not a dataframe. >> >>> So I extracted age, time and event. >> >> Code? The code you offered before would have created a newdata >> object (yes, a data.frame) with a single column bearing the same >> name as the vector argument ,,,,, "test1". Not named "age". Try it. >> do str on such an object: >> >> str(dataframe(test1)) >> >> >>> So test is data frame,(age, time, event). does that suffice? >> >> It certainly does not allow me to reproduce the error you got (which >> I still think is probably related to the structure of your argument >> to newdata.) >> >> That's all I can say without data and code. >> >> --David. >> >> >>> >>> >>> >>> ________________________________ >>> From: David Winsemius <dwinsem...@comcast.net> >>> >>> Cc: "r-help@R-project.org" <r-help@r-project.org> >>> Sent: Sunday, August 21, 2011 3:19 AM >>> Subject: Re: [R] val.surv >>> >>> >>> On Aug 20, 2011, at 8:08 PM, Salvo Mac wrote: >>> >>>> Thanks David >>>> >>>> However, I tried your trick on val.surv with newdata=test['age'] >>>> but still didn't work. >>>> Still gives the same error message: >>>> >>>> Error in val.surv(f.1, newdata = test1["age"], u = 10) : >>>> dims [product 1797] do not match the length of object [2496] >>>> In addition: Warning message: >>>> In est.surv + S[, 1] : >>>> longer object length is not a multiple of shorter object length >>>> >>> >>> As I said (and you did >>> not act upon): >>> >>> The fundamental thing you are doing wrong for q1 is failing to >>> unambiguously describe the test object. >>> >>> I said it was a guess. Now stop wasting our time and offer what is >>> needed. >>> >>> --david. >>>> >>>> Salvo >>>> ________________________________ >>>> From: David Winsemius <dwinsem...@comcast.net> >>>> >>>> Cc: "r-help@R-project.org" <r-help@r-project.org> >>>> Sent: Sunday, August 21, 2011 12:55 AM >>>> Subject: Re: [R] val.surv >>>> >>>> >>>> On Aug 20, 2011, at 3:32 PM, Salvo Mac wrote: >>>> >>>>> Dear R-users, >>>>> >>>>> I have two questions regarding >>> validation and calibration of Survival regression models. >>>>> >>>>> 1. I am trying to calibrate and validate a cox model using >>>>> val.surv. >>>>> here is my code: >>>>> f.1<-cph(Surv(time,event)~age, x=T, y=T, data=train) >>>>> test1<-test[,"age"] >>>>> val.surv(f.1, newdata=data.frame(test1), u=10) >>>>> >>>>> but I get an error message: >>>>> >>>>> Error in val.surv(f.1, newdata = data.frame(testi), u = 10) : >>>>> dims [product 1797] do not match the length of object [2496] >>>>> In addition: Warning message: >>>>> In est.surv + S[, 1] : >>>>> longer object length is not a multiple of shorter object >>>>> length >>>>> >>>>> I ran the example in the r-documentation but couldn't >>>>> extract dxy from result. >>>>> >>>>> What am I doing wrong? >>>> >>>> The fundamental thing you are doing wrong for q1 is failing to >>>> unambiguously describe the test object. I would think that if test >>>> were a dataframe then wrapping data.frame around a vector might >>>> not get it named correctly as 'age'. You might try newdata= >>>> test['age']. Just a guess. >>>> >>>>> >>>>> 2. In validate and calibrate cph functions. If it is frailty >>>>> fit, does the the bootstrap resample clusters or just individuals >>>> >>>> The code above appears to be dependent on the rms package. The >>>> frailty function is part of the underlying survival package and I >>>> do not see it mentioned in the index for Harrell's RMS text. You >>>> will probably need to wait until Frank comes across this. He is >>>> generally very good about correction my errors and knowledge gaps. >>>> >>>>> >>>> >> >> David Winsemius, MD >> West Hartford, >> CT<train.txt><test.txt>______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ----- Frank Harrell Department of Biostatistics, Vanderbilt University -- View this message in context: http://r.789695.n4.nabble.com/val-surv-tp3757712p3758717.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.