subject:"\[R\] Interpreting the example given by Prof Frank Harrell in \{Design\} validate.cph"

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-26 Thread vikkiyft

Thank you very very very much Prof Harrell!! You've made my day!!

-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3325844.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-25 Thread Frank Harrell


P.S. I used the latest version of the rms package to run this.  The Design
package is no longer supported.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3325050.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-25 Thread Frank Harrell


Here's the way I would explore this, and some of the code is made more tidy. 
Note that also you could vectorize your simulation.  I have used set.seed
multiple times to make bootstrap samples the same across runs.  -Frank

. . .
if (data[i, 3] == 4) data[i, 5] <- sample(c(0, 1), 1, prob=c(.06,  .94))} 

d <- data.frame(tumor=factor(data[,1]), ecog=factor(data[,2]),
rx=factor(data[,3]), os=data[,4], censor=data[,5])
S <- with(d, Surv(os, censor))

## Check collinearity of rx with other predictors
lrm(rx ~ tumor*ecog, data=d)
## What is the marginal strength of rx (assuming PH)?
cph(S ~ rx, data=d)
## What is partial effect of rx (assuming PH)?
anova(cph(S ~ tumor + ecog + rx, data=d))
## What is combined partial effect of tumor and ecog adjusting for rx?
anova(cph(S ~ tumor + ecog + strat(rx), data=d), tumor, ecog) ## nothing but
noise
## What is their effect not adjusting for rx
cph(S ~ tumor + ecog, data=d)  ## huge

f <- cph(S ~ tumor + ecog, x=TRUE, y=TRUE, surv=TRUE, data=d)
set.seed(1)
validate(f, B=100, dxy=TRUE)
w <- rep(1, 1000)   #  only one stratum, doesn't change model
f <- cph(S ~ tumor + ecog + strat(w), x=TRUE, y=TRUE, surv=TRUE, data=d)
set.seed(1)
validate(f, B=100, dxy=TRUE, u=60)
## identical to last validate except for -Dxy

f <- cph(S ~ tumor + ecog + strat(rx), x=TRUE, y=TRUE, surv=TRUE,
time.inc=60, data=d)
set.seed(1)
validate(f, B=100)  ## no predictive ability
set.seed(1)
validate(f, B=100, dxy=TRUE, u=60)
## Only Dxy indicates some predictive information; large in abs. value
## than model ignoring rx (0.3842 vs. 0.3177)




-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3324516.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-24 Thread vikkiyft


Dear Prof Frank,

I tried to simulate an example data set as close as possible to my own real
data with the codes below. There are only two covariates, tumor(3 levels)
and ecog(3 levels). "rx" is treatment (4 levels). Validation with the
stratified model (by rx) had a negative R2.. and the R2 under the training
column was so high...? 

n<-1000
set.seed(110222)
data<-matrix(rep(0,5000),ncol=5)
data[,1]<-sample(1:3,n,rep=TRUE,prob=c(.32, .30, .38))
for (i in 1:1000){
if (data[i,1]==1) data[i,2]<-sample(1:3,1,prob=c(.76, .18, .06))
if (data[i,1]==2) data[i,2]<-sample(1:3,1,prob=c(.67, .24, .09))
if (data[i,1]==3) data[i,2]<-sample(1:3,1,prob=c(.47, .37, .16))}
for (i in 1:1000){
if (data[i,1]==1) data[i,3]<-sample(1:4,1,prob=c(.70, .19, .03, .08))
if (data[i,1]==2) data[i,3]<-sample(1:4,1,prob=c(.42, .28, .12, .18))
if (data[i,1]==3) data[i,3]<-sample(1:4,1,prob=c(.11, .29, .30, .30))}
for (i in 1:1000){
if (data[i,3]==1)
data[i,4]<-12*rgamma(1000,rate=0.4,shape=1.7)[c(sample(26:975,1,prob=c(rep(1/950,950]
if (data[i,3]==2)
data[i,4]<-12*rgamma(1000,rate=0.9,shape=1.7)[c(sample(26:975,1,prob=c(rep(1/950,950]
if (data[i,3]==3)
data[i,4]<-12*rgamma(1000,rate=1.2,shape=0.6)[c(sample(26:975,1,prob=c(rep(1/950,950]
if (data[i,3]==4)
data[i,4]<-12*rgamma(1000,rate=1.5,shape=0.7)[c(sample(26:975,1,prob=c(rep(1/950,950]}
for (i in 1:1000){
if (data[i,3]==1) data[i,5]<-sample(c(0,1),1,prob=c(.53, .47))
if (data[i,3]==2) data[i,5]<-sample(c(0,1),1,prob=c(.17, .83))
if (data[i,3]==3) data[i,5]<-sample(c(0,1),1,prob=c(.05, .95))
if (data[i,3]==4) data[i,5]<-sample(c(0,1),1,prob=c(.06, .94))}
colnames(data)<-c("tumor","ecog","rx","os","censor")
data<-data.frame(data)
attach(data)

library(rms)
surv.obj<-Surv(os,censor)
validate(cph(surv.obj~factor(tumor)+factor(ecog),x=T,y=T,surv=T),method="b",B=100,dxy=T)
validate(cph(surv.obj~factor(tumor)+factor(ecog)+strat(rx),
x=T,y=T,surv=T,time.inc=60),method="b",B=100,dxy=T,u=60)



Best regards,
Vikki
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3323016.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-22 Thread Frank Harrell


I think it should be the negative of the first Dxy but this is all why the
posting guide says to create the simplest self-defined example that shows
the problem.  That way I could run it and get to the bottom of this.  See
the help file for cph which has examples of simulating test data.  Try to
have only two covariates.
Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3319138.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-22 Thread vikkiyft


I really appreciate your help Prof Harrell! 

I followed your instruction and re-ran the second model without strat but
with surv=TRUE, time.inc=30, and u=30 to validate, the Dxy was really the
same as that in the first model output! But this confused me...shouldn't the
Dxy be positive in this case because u was specified to estimate the
concordance between surv prob and surv time???

> library(survival) 
> attach(colon) 
> S<-Surv(time,status)
> obstruct0<-factor(obstruct)
> perfor0<-factor(perfor)
> adhere0<-factor(adhere)
> differ0<-factor(differ)
> extent0<-factor(extent)
> node40<-factor(node4)
> f2<-cph(S~obstruct0+perfor0+adhere0+differ0+extent0+node40,x=T,y=T,surv=T,time.inc=30)
> set.seed(110221) 
> validate(f2,method="b",B=100,dxy=T,pr=F,u=30)
  index.orig trainingtest optimism index.corrected   n
Dxy  -0.2918  -0.2932 -0.2861  -0.0070 -0.2847 100
R20.1145   0.1191  0.1104   0.0088  0.1057 100
Slope 1.   1.  0.9626   0.0374  0.9626 100
D 0.0170   0.0178  0.0164   0.0014  0.0156 100
U-0.0002  -0.0002  0.0001  -0.0003  0.0001 100
Q 0.0172   0.0179  0.0162   0.0017  0.0155 100
g 0.5472   0.5590  0.5348   0.0242  0.5230 100


(the Dxy's of -0.54 or 0.6 were estimated from my own data and were not
shown here because of the difficulty to produce codes that simulate my data,
sorry for the confusion!)

Best regards,
Vikki
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3318554.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-21 Thread Frank Harrell


Don't worry about the sign.  When predicting relative log hazard, high hazard
means short survival time so Dxy is negative.  When predicting survival
probability (u specified), high prob. means long survival time so Dxy is
positive.  You can just reverse the sign when u is not specified.

I did not see Dxy of -.54 or .6 anywhere on the output you sent.

Specify time.inc of 30 to cph in the 2nd case.  As you did not provide code
to simulate test data, please run the second model yourself without strat
but with surv=TRUE time.inc=30 (and u=30 to validate) to see if you get the
same Dxy as for the first model, before bringing strat into the picture.

rms and Design prefer for you to store the predictors as factor()s in the
dataset instead of specifying this in the model formula.

Frank

-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3318189.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-21 Thread vikkiyft


Thank you very much Prof Harrell!

Sorry that I am new to this forum, and so ain't familiar with how to post
message appropriately.

I repeated the same procedure using a dataset from the {survival} package.
This time I used the {rms} package, and 100 bootstrap samples:

> library(rms)
> library(survival)
> attach(colon)
> S<-Surv(time,status)

> f<-cph(S~factor(obstruct)+factor(perfor)+factor(adhere)+factor(differ)+factor(extent)+factor(node4),x=T,y=T,surv=T)
>   
> # no stratification
> set.seed(110221)
> validate(f,method="b",B=100,dxy=T,pr=F)
  index.orig trainingtest optimism index.corrected   n
Dxy  -0.2918  -0.2932 -0.2861  -0.0070 -0.2847 100
R20.1145   0.1191  0.1104   0.0088  0.1057 100
Slope 1.   1.  0.9626   0.0374  0.9626 100
D 0.0170   0.0178  0.0164   0.0014  0.0156 100
U-0.0002  -0.0002  0.0001  -0.0003  0.0001 100
Q 0.0172   0.0179  0.0162   0.0017  0.0155 100
g 0.5472   0.5590  0.5348   0.0242  0.5230 100

> f2<-cph(S~factor(obstruct)+factor(perfor)+factor(adhere)+factor(differ)+factor(extent)+factor(node4)+strat(rx),x=T,y=T,surv=T)
>  
> # with stratification
> set.seed(110221)
> validate(f2,method="b",B=100,dxy=T,pr=F,u=30)
  index.orig training   test optimism index.corrected   n
Dxy   0.1567   0.1966 0.1826   0.0140  0.1426 100
R20.1154   0.1191 0.   0.0081  0.1073 100
Slope 1.   1. 0.9720   0.0280  0.9720 100
D 0.0203   0.0210 0.0195   0.0015  0.0188 100
U-0.0002  -0.0002 0.0001  -0.0003  0.0001 100
Q 0.0205   0.0212 0.0193   0.0018  0.0186 100
g 0.5523   0.5591 0.5402   0.0189  0.5333 100

The same situation happened again. The Dxy's were all in opposite
directions.

In fact my case is even worse than these examples － the Dxy for
non-stratified model was -0.54 but the Dxy for stratified model was almost
+0.6; and the bootstrap validated R^2 was even negative!!

But..why does this happen??


Thanks a lot,
Vikki
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3317743.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-21 Thread Frank Harrell


Vikky,

You'll notice that the model containing sex in addition to age has a higher
apparent Dxy as you would expect [R^2 is not higher because it captures only
the age effect].  The validated Dxy's may be as they are because of the very
low number of bootstrap resamples (10) that you used.

Please following the posting guide, i.e., include a simple self-reproducing
script that I can run - one that simulates its own dataset.

Please switch to the rms package is this is the one that is being fully
supported.  See http://biostat.mc.vanderbilt.edu/Rrms.

Frank


-
Frank Harrell
Department of Biostatistics, Vanderbilt University
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3317265.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

2011-02-21 Thread vikkiyft


Dear R-help,

I am having a problem with the interpretation of result from validate.cph in
the Design package.

My purpose is to fit a cox model and validate the Somer's Dxy. I used the
hypothetical data given in the help manual with modification to the cox
model fit. My research problem is very similar to this example.

This is the model without stratification: 
> library(Design) 
> f1 <- cph(S ~ age, x=TRUE, y=TRUE)
> coef(f)
   age 
0.0440095
> set.seed(1)
> validate(f1, B=10, dxy=T)
 index.orig  training  test  optimism
index.corrected  n
Dxy   -0.3376784858 -0.3287537760 -0.3376784858  0.0089247099
-0.34660320 10
R2 0.0627722521  0.0636136044  0.0627722521  0.0008413523 
0.06193090 10
Slope  1.00  1.00  0.9896987441  0.0103012559 
0.98969874 10
D  0.0237993965  0.0239476118  0.0237993965  0.0001482153 
0.02365118 10
U -0.0008208378 -0.0008141441  0.0007104737 -0.0015246178 
0.00070378 10
Q  0.0246202342  0.0247617559  0.0230889228  0.0016728331 
0.02294740 10
 
But if I fit a stratified cox model to the same data, the result becomes: 
> f2<- cph(S ~ age + strat(sex), x=TRUE, y=TRUE, surv=TRUE, time.inc=2) 
> coef(f)
   age 
0.04271953
> set.seed(1)
> validate(f2, dxy=TRUE, u=2, B=10)
 index.orig  training test  optimism index.corrected 
n
Dxy0.3514778665  0.3259011492 0.3044982080  0.02140294120.3300749254
10
R2 0.0622369082  0.0651967502 0.0622369082  0.00295984190.0592770663
10
Slope  1.00  1.00 0.9621830568  0.03781694320.9621830568
10
D  0.0257519780  0.0267239073 0.0257519780  0.00097192930.0247800486
10
U -0.0009257142 -0.0009125388 0.0009102968 -0.00182283560.0008971213
10
Q  0.0266776922  0.0276364461 0.0248416812  0.00279476490.0238829273
10 

The coefficients are similar between the models, so I expect the results
would be somewhat similar, yet the two models give totally contrasting Dxy.
I reckon a negative Dxy value is normal in the sense that the survival time
and the prediction are concordant, but why does the result become discordant
when stratification is used? 
Is there something wrong or is there a sensible interpretation for this? 

This problem is very critical to me because the Design package is the only
one I can use for my purpose. Any advice is greatly appreciated. 


Best regards,
Vikki
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Interpreting-the-example-given-by-Prof-Frank-Harrell-in-Design-validate-cph-tp3316820p3316820.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

Re: [R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

[R] Interpreting the example given by Prof Frank Harrell in {Design} validate.cph

10 matches

Site Navigation

Mail list logo

Footer information