Re: [R] about the summary(cph.object)

2009-08-02 Thread David Winsemius
Here's what I would do. Let's assume that you are presenting the  
results of the example on the cph help page.  I agree with you that  
the results should be presented on the hazard ratio scale. The Design  
package provides appropriate plotting tools for creation of  
publication quality graphics. in your example I would issue the  
following command:

plot(f, age=NA, sex= NA, #age will be on the x axis and separate  
plots will be done for male and female
   col= c(red, blue),  # fortunately Frank constructed  
plot.Design, so that both the main effects plots and the CI plots get  
colored
  fun = exp, # this results in the log  
hazard effects on the y axis  being transformed to the hazard ratio  
scale
  ylab = Hazard Ratios)   # and this adds a meaningful label  
fo the y axis

An alternative plot would be to suppress the CI lines which might be  
more effective in getting across the message:

plot(f, age=NA, sex= NA, col= c(red, blue), fun = exp, ylab =  
Hazard Ratios, conf.int=FALSE)

Given that the model forces the male-female curves to be equidistant  
on the log hazards scale, I would also examine the possibility that  
they were in truth different by creating an interaction model and  
testing for difference between that model and the more simplistic  
model. In this case it is no surprise that the statistical test fails  
to support such a model:

f2 - cph(Srv ~ rcs(age,4)* sex, x=TRUE, y=TRUE)

anova(f)
anova*f2)

Which mean I would not get the challenge of explaining to my audience  
the notion of different forms of the age effect for men and women. You  
would explain in your paper that the model shows that males are on  
average exp(-0.6445) = 0.5249 times as likely to experience the event  
of interest. Ths is not at all typical for ordinary humans and this  
would require careful consideration and review of the experimental  
situation. The hazard ratios at each age relative to a person of  
average age are displayed on the graphic.  They show a plateau in  
mortality at young ages (which is typical of free-range humans) but  
that risk then doubles each decade between ages 45 and 60 but the rise  
tapers off somewhat at the extremes of age. Those features are typical  
for human mortality and would not require as much commentary.

-- 
David

On Aug 1, 2009, at 8:53 PM, zhu yao wrote:

 Thx for your reply.
 In this example, age was transformed with rcs. So the output was  
 different between f and summary(f).
 If I need to publicate the results, how do I explation the hazard  
 ratio of age?

 2009/8/1 David Winsemius dwinsem...@comcast.net

 On Jul 31, 2009, at 11:24 PM, zhu yao wrote:

 Could someone explain the summary(cph.object)?

 The example is in the help file of cph.

 n - 1000
 set.seed(731)
 age - 50 + 12*rnorm(n)
 label(age) - Age
 sex - factor(sample(c('Male','Female'), n,
 rep=TRUE, prob=c(.6, .4)))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 dt - -log(runif(n))/h
 label(dt) - 'Follow-up Time'
 e - ifelse(dt = cens,1,0)
 dt - pmin(dt, cens)
 units(dt) - Year
 dd - datadist(age, sex)
 options(datadist='dd')

 This is process for  setting the range for the display of effects in  
 Design regression objects. See:

 ?datadist

 q.effect
 set of two quantiles for computing the range of continuous variables  
 to use in estimating regression effects. Defaults are c(.25,.75),  
 which yields inter-quartile-range odds ratios, etc.

 ?summary.Design
 #---
  By default, inter-quartile range effects (odds ratios, hazards  
 ratios, etc.) are printed for continuous factors, ... 
 #---
 Value
 For summary.Design, a matrix of class summary.Design with rows  
 corresponding to factors in the model and columns containing the low  
 and high values for the effects, the range for the effects, the  
 effect point estimates (difference in predicted values for high and  
 low factor values), the standard error of this effect estimate, and  
 the lower and upper confidence limits.

 #---



 Srv - Surv(dt,e)

 f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
 summary(f)

Effects   
 Response : Srv

 FactorLowHigh   Diff.  Effect S.E. Lower 0.95 Upper  
 0.95
 age   40.872 57.385 16.513 1.21   0.21 0.80   1.62
  Hazard Ratio 40.872 57.385 16.513 3.35 NA 2.22   5.06

 In this case with a 4 df regression spline, you need to look at  the  
 effect across the range of the variable. You ought to plot the age  
 effect and examine anova(f) ). In the untransformed situation the  
 plot is on the log hazards scale for cph. So the effect for age in  
 this case should be the difference in log hazard at ages 40.872 and  
 57.385. SE is the standard error of that estimate and the Upper and  
 Lower numbers are the confidence bounds on the effect estimate. The  
 Hazard Ratio row gives you exponentiated results, so a difference in  

Re: [R] about the summary(cph.object)

2009-08-02 Thread zhu yao
Thanks for your help!

2009/8/2 David Winsemius dwinsem...@comcast.net

 Here's what I would do. Let's assume that you are presenting the results of
 the example on the cph help page.  I agree with you that the results should
 be presented on the hazard ratio scale. The Design package provides
 appropriate plotting tools for creation of publication quality graphics. in
 your example I would issue the following command:
 plot(f, age=NA, sex= NA, #age will be on the x axis and separate plots
 will be done for male and female
   col= c(red, blue),  # fortunately Frank constructed
 plot.Design, so that both the main effects plots and the CI plots get
 colored
  fun = exp, # this results in the log hazard
 effects on the y axis  being transformed to the hazard ratio scale
  ylab = Hazard Ratios)   # and this adds a meaningful label fo
 the y axis

 An alternative plot would be to suppress the CI lines which might be more
 effective in getting across the message:

 plot(f, age=NA, sex= NA, col= c(red, blue), fun = exp, ylab = Hazard
 Ratios, conf.int=FALSE)

 Given that the model forces the male-female curves to be equidistant on the
 log hazards scale, I would also examine the possibility that they were in
 truth different by creating an interaction model and testing for
 difference between that model and the more simplistic model. In this case it
 is no surprise that the statistical test fails to support such a model:

 f2 - cph(Srv ~ rcs(age,4)* sex, x=TRUE, y=TRUE)

 anova(f)
 anova*f2)

 Which mean I would not get the challenge of explaining to my audience the
 notion of different forms of the age effect for men and women. You would
 explain in your paper that the model shows that males are on
 average exp(-0.6445) = 0.5249 times as likely to experience the event of
 interest. Ths is not at all typical for ordinary humans and this would
 require careful consideration and review of the experimental situation. The
 hazard ratios at each age relative to a person of average age are displayed
 on the graphic.  They show a plateau in mortality at young ages (which is
 typical of free-range humans) but that risk then doubles each decade between
 ages 45 and 60 but the rise tapers off somewhat at the extremes of age.
 Those features are typical for human mortality and would not require as much
 commentary.

 --
 David

 On Aug 1, 2009, at 8:53 PM, zhu yao wrote:

 Thx for your reply.
 In this example, age was transformed with rcs. So the output was different
 between f and summary(f).
 If I need to publicate the results, how do I explation the hazard ratio of
 age?

 2009/8/1 David Winsemius dwinsem...@comcast.net


 On Jul 31, 2009, at 11:24 PM, zhu yao wrote:

  Could someone explain the summary(cph.object)?

 The example is in the help file of cph.

 n - 1000
 set.seed(731)
 age - 50 + 12*rnorm(n)
 label(age) - Age
 sex - factor(sample(c('Male','Female'), n,
 rep=TRUE, prob=c(.6, .4)))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 dt - -log(runif(n))/h
 label(dt) - 'Follow-up Time'
 e - ifelse(dt = cens,1,0)
 dt - pmin(dt, cens)
 units(dt) - Year
 dd - datadist(age, sex)
 options(datadist='dd')


 This is process for  setting the range for the display of effects in
 Design regression objects. See:

 ?datadist

 q.effect
 set of two quantiles for computing the range of continuous variables to
 use in estimating regression effects. Defaults are c(.25,.75), which yields
 inter-quartile-range odds ratios, etc.

 ?summary.Design
 #---
  By default, inter-quartile range effects (odds ratios, hazards ratios,
 etc.) are printed for continuous factors, ... 
 #---
 Value
 For summary.Design, a matrix of class summary.Design with rows
 corresponding to factors in the model and columns containing the low and
 high values for the effects, the range for the effects, the effect point
 estimates (difference in predicted values for high and low factor values),
 the standard error of this effect estimate, and the lower and upper
 confidence limits.

 #---


  Srv - Surv(dt,e)

 f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
 summary(f)

Effects  Response :
 Srv

 FactorLowHigh   Diff.  Effect S.E. Lower 0.95 Upper 0.95
 age   40.872 57.385 16.513 1.21   0.21 0.80   1.62
  Hazard Ratio 40.872 57.385 16.513 3.35 NA 2.22   5.06


 In this case with a 4 df regression spline, you need to look at  the
 effect across the range of the variable. You ought to plot the age effect
 and examine anova(f) ). In the untransformed situation the plot is on the
 log hazards scale for cph. So the effect for age in this case should be the
 difference in log hazard at ages 40.872 and 57.385. SE is the standard error
 of that estimate and the Upper and Lower numbers are the confidence bounds
 on the effect estimate. The Hazard Ratio row gives you exponentiated
 results, so a 

Re: [R] about the summary(cph.object)

2009-08-01 Thread David Winsemius


On Jul 31, 2009, at 11:24 PM, zhu yao wrote:


Could someone explain the summary(cph.object)?

The example is in the help file of cph.

n - 1000
set.seed(731)
age - 50 + 12*rnorm(n)
label(age) - Age
sex - factor(sample(c('Male','Female'), n,
 rep=TRUE, prob=c(.6, .4)))
cens - 15*runif(n)
h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt - -log(runif(n))/h
label(dt) - 'Follow-up Time'
e - ifelse(dt = cens,1,0)
dt - pmin(dt, cens)
units(dt) - Year
dd - datadist(age, sex)
options(datadist='dd')


This is process for  setting the range for the display of effects in  
Design regression objects. See:


?datadist

q.effect
set of two quantiles for computing the range of continuous variables  
to use in estimating regression effects. Defaults are c(.25,.75),  
which yields inter-quartile-range odds ratios, etc.


?summary.Design
#---
 By default, inter-quartile range effects (odds ratios, hazards  
ratios, etc.) are printed for continuous factors, ... 

#---
Value
For summary.Design, a matrix of class summary.Design with rows  
corresponding to factors in the model and columns containing the low  
and high values for the effects, the range for the effects, the effect  
point estimates (difference in predicted values for high and low  
factor values), the standard error of this effect estimate, and the  
lower and upper confidence limits.


#---



Srv - Surv(dt,e)

f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
summary(f)

Effects   
Response : Srv


FactorLowHigh   Diff.  Effect S.E. Lower 0.95 Upper  
0.95

age   40.872 57.385 16.513 1.21   0.21 0.80   1.62
 Hazard Ratio 40.872 57.385 16.513 3.35 NA 2.22   5.06


In this case with a 4 df regression spline, you need to look at  the  
effect across the range of the variable. You ought to plot the age  
effect and examine anova(f) ). In the untransformed situation the plot  
is on the log hazards scale for cph. So the effect for age in this  
case should be the difference in log hazard at ages 40.872 and 57.385.  
SE is the standard error of that estimate and the Upper and Lower  
numbers are the confidence bounds on the effect estimate. The Hazard  
Ratio row gives you exponentiated results, so a difference in log  
hazards becomes a hazard ratio. {exp(1.21) = 3.35}



sex - Female:Male  2.000  1.000 NA 0.64   0.15 0.35   0.94
 Hazard Ratio  2.000  1.000 NA 1.91 NA 1.42   2.55


Wat's the meaning of Effect, S.E. Lower, Upper?


You probably ought to read a bit more basic material. If you are  
asking this question, Harrell's Regression Modeling Strategies might  
be over you head, but it would probably be a good investment anyway.  
Venables and Ripley's Modern Applied Statistics has a chapter on  
survival analysis. Also consider Kalbfliesch and Prentice Statistical  
Analysis of Failure Time Data. I'm sure there are others;  those are  
the ones I have on my shelf.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about the summary(cph.object)

2009-08-01 Thread zhu yao
Thx for your reply.
In this example, age was transformed with rcs. So the output was different
between f and summary(f).
If I need to publicate the results, how do I explation the hazard ratio of
age?

2009/8/1 David Winsemius dwinsem...@comcast.net


 On Jul 31, 2009, at 11:24 PM, zhu yao wrote:

  Could someone explain the summary(cph.object)?

 The example is in the help file of cph.

 n - 1000
 set.seed(731)
 age - 50 + 12*rnorm(n)
 label(age) - Age
 sex - factor(sample(c('Male','Female'), n,
 rep=TRUE, prob=c(.6, .4)))
 cens - 15*runif(n)
 h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
 dt - -log(runif(n))/h
 label(dt) - 'Follow-up Time'
 e - ifelse(dt = cens,1,0)
 dt - pmin(dt, cens)
 units(dt) - Year
 dd - datadist(age, sex)
 options(datadist='dd')


 This is process for  setting the range for the display of effects in Design
 regression objects. See:

 ?datadist

 q.effect
 set of two quantiles for computing the range of continuous variables to use
 in estimating regression effects. Defaults are c(.25,.75), which yields
 inter-quartile-range odds ratios, etc.

 ?summary.Design
 #---
  By default, inter-quartile range effects (odds ratios, hazards ratios,
 etc.) are printed for continuous factors, ... 
 #---
 Value
 For summary.Design, a matrix of class summary.Design with rows
 corresponding to factors in the model and columns containing the low and
 high values for the effects, the range for the effects, the effect point
 estimates (difference in predicted values for high and low factor values),
 the standard error of this effect estimate, and the lower and upper
 confidence limits.

 #---


  Srv - Surv(dt,e)

 f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
 summary(f)

Effects  Response : Srv

 FactorLowHigh   Diff.  Effect S.E. Lower 0.95 Upper 0.95
 age   40.872 57.385 16.513 1.21   0.21 0.80   1.62
  Hazard Ratio 40.872 57.385 16.513 3.35 NA 2.22   5.06


 In this case with a 4 df regression spline, you need to look at  the
 effect across the range of the variable. You ought to plot the age effect
 and examine anova(f) ). In the untransformed situation the plot is on the
 log hazards scale for cph. So the effect for age in this case should be the
 difference in log hazard at ages 40.872 and 57.385. SE is the standard error
 of that estimate and the Upper and Lower numbers are the confidence bounds
 on the effect estimate. The Hazard Ratio row gives you exponentiated
 results, so a difference in log hazards becomes a hazard ratio. {exp(1.21) =
 3.35}

  sex - Female:Male  2.000  1.000 NA 0.64   0.15 0.35   0.94
  Hazard Ratio  2.000  1.000 NA 1.91 NA 1.42   2.55


 Wat's the meaning of Effect, S.E. Lower, Upper?


 You probably ought to read a bit more basic material. If you are asking
 this question, Harrell's Regression Modeling Strategies might be over you
 head, but it would probably be a good investment anyway. Venables and
 Ripley's Modern Applied Statistics has a chapter on survival analysis.
 Also consider Kalbfliesch and Prentice Statistical Analysis of Failure Time
 Data. I'm sure there are others;  those are the ones I have on my shelf.

 David Winsemius, MD
 Heritage Laboratories
 West Hartford, CT



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] about the summary(cph.object)

2009-08-01 Thread Frank E Harrell Jr

zhu yao wrote:

Thx for your reply.
In this example, age was transformed with rcs. So the output was different
between f and summary(f).
If I need to publicate the results, how do I explation the hazard ratio of
age?


David explained this.  Nonlinearity in age does not complicate the 
explanation.  The estimate is the estimate of the ratio of hazard rate 
at the upper quartile of age compared to the hazard ratio at the lower 
quartile, with the ages corresponding to these 2 points shown in the output.


The output of f is not very useful for publication.  The output of 
summary, Function, and latex are.


Frank



2009/8/1 David Winsemius dwinsem...@comcast.net


On Jul 31, 2009, at 11:24 PM, zhu yao wrote:

 Could someone explain the summary(cph.object)?

The example is in the help file of cph.

n - 1000
set.seed(731)
age - 50 + 12*rnorm(n)
label(age) - Age
sex - factor(sample(c('Male','Female'), n,
rep=TRUE, prob=c(.6, .4)))
cens - 15*runif(n)
h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt - -log(runif(n))/h
label(dt) - 'Follow-up Time'
e - ifelse(dt = cens,1,0)
dt - pmin(dt, cens)
units(dt) - Year
dd - datadist(age, sex)
options(datadist='dd')


This is process for  setting the range for the display of effects in Design
regression objects. See:

?datadist

q.effect
set of two quantiles for computing the range of continuous variables to use
in estimating regression effects. Defaults are c(.25,.75), which yields
inter-quartile-range odds ratios, etc.

?summary.Design
#---
 By default, inter-quartile range effects (odds ratios, hazards ratios,
etc.) are printed for continuous factors, ... 
#---
Value
For summary.Design, a matrix of class summary.Design with rows
corresponding to factors in the model and columns containing the low and
high values for the effects, the range for the effects, the effect point
estimates (difference in predicted values for high and low factor values),
the standard error of this effect estimate, and the lower and upper
confidence limits.

#---


 Srv - Surv(dt,e)

f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
summary(f)

   Effects  Response : Srv

FactorLowHigh   Diff.  Effect S.E. Lower 0.95 Upper 0.95
age   40.872 57.385 16.513 1.21   0.21 0.80   1.62
 Hazard Ratio 40.872 57.385 16.513 3.35 NA 2.22   5.06


In this case with a 4 df regression spline, you need to look at  the
effect across the range of the variable. You ought to plot the age effect
and examine anova(f) ). In the untransformed situation the plot is on the
log hazards scale for cph. So the effect for age in this case should be the
difference in log hazard at ages 40.872 and 57.385. SE is the standard error
of that estimate and the Upper and Lower numbers are the confidence bounds
on the effect estimate. The Hazard Ratio row gives you exponentiated
results, so a difference in log hazards becomes a hazard ratio. {exp(1.21) =
3.35}

 sex - Female:Male  2.000  1.000 NA 0.64   0.15 0.35   0.94

 Hazard Ratio  2.000  1.000 NA 1.91 NA 1.42   2.55


Wat's the meaning of Effect, S.E. Lower, Upper?


You probably ought to read a bit more basic material. If you are asking
this question, Harrell's Regression Modeling Strategies might be over you
head, but it would probably be a good investment anyway. Venables and
Ripley's Modern Applied Statistics has a chapter on survival analysis.
Also consider Kalbfliesch and Prentice Statistical Analysis of Failure Time
Data. I'm sure there are others;  those are the ones I have on my shelf.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Frank E Harrell Jr   Professor and Chair   School of Medicine
 Department of Biostatistics   Vanderbilt University

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] about the summary(cph.object)

2009-07-31 Thread zhu yao
Could someone explain the summary(cph.object)?

The example is in the help file of cph.

n - 1000
set.seed(731)
age - 50 + 12*rnorm(n)
label(age) - Age
sex - factor(sample(c('Male','Female'), n,
  rep=TRUE, prob=c(.6, .4)))
cens - 15*runif(n)
h - .02*exp(.04*(age-50)+.8*(sex=='Female'))
dt - -log(runif(n))/h
label(dt) - 'Follow-up Time'
e - ifelse(dt = cens,1,0)
dt - pmin(dt, cens)
units(dt) - Year
dd - datadist(age, sex)
options(datadist='dd')
Srv - Surv(dt,e)

f - cph(Srv ~ rcs(age,4) + sex, x=TRUE, y=TRUE)
summary(f)

 Effects  Response : Srv

 FactorLowHigh   Diff.  Effect S.E. Lower 0.95 Upper 0.95
 age   40.872 57.385 16.513 1.21   0.21 0.80   1.62
  Hazard Ratio 40.872 57.385 16.513 3.35 NA 2.22   5.06
 sex - Female:Male  2.000  1.000 NA 0.64   0.15 0.35   0.94
  Hazard Ratio  2.000  1.000 NA 1.91 NA 1.42   2.55


Wat's the meaning of Effect, S.E. Lower, Upper?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.