Re: [R] differences between survival models between STATA and R

jthetzel Mon, 09 Jul 2012 14:53:03 -0700

Be sure you reply or forward your message to me to the r-help listhost.  I
might not have time to review it tonight, while others might.


Jeremy

On Mon, Jul 9, 2012 at 2:38 PM, JPF [via R] <
ml-node+s789695n4635888...@n4.nabble.com> wrote:

> Please, find an example with a data set here.
>
> The data sets are prepared to be read with R (dataR) and STATA (dataSTATA)
> directly. The only difference is that STATA treats blanks as NAs.
>
> *in R, by using aftreg:*
>
> dataR<-read.table("clipboard",h=T)
> > su<-aftreg(Surv(sta,sto,S) ~
>  a+b+c+d+g+h+j+k+l+m+factor(pcont),dist="weibull",
> data=data.frame(dataR),id=ID)
> > summary(su)
>
> Call:
> aftreg(formula = Surv(sta, sto, S) ~ a + b + c + d + g + h +
>     j + k + l + m + factor(pcont), data = data.frame(thern),
>     dist = "weibull", id = ID)
>
> Covariate          W.mean      Coef Exp(Coef)  se(Coef)    Wald p
> a                  39.416     0.001     1.001     0.002     0.598
> b                  55.879     0.002     1.002     0.001     0.002
> c                  49.554    -0.001     0.999     0.001     0.251
> d                  51.266     0.000     1.000     0.000     0.758
> g                  14.701    -0.006     0.994     0.002     0.011
> h                  32.358     0.005     1.005     0.001     0.000
> j                  51.768    -0.022     0.978     0.001     0.000
> k                  53.832    -0.004     0.996     0.001     0.000
> l                  18.851     0.016     1.017     0.001     0.000
> m                   0.809     0.907     2.478     0.132     0.000
> factor(pcont)
>                1    0.007     0         1           (reference)
>                2    0.272    -0.051     0.950     0.158     0.745
>                3    0.201     0.653     1.922     0.155     0.000
>                4    0.253     0.181     1.198     0.156     0.246
>                5    0.267     0.691     1.996     0.162     0.000
>
> log(scale)                    3.369    29.038     0.225     0.000
> log(shape)                    1.925     6.858     0.055     0.000
>
> Events                    190
> Total time at risk          3129
> Max. log. likelihood      -298.47
> LR test statistic         681
> Degrees of freedom        14
> Overall p-value           0
>
>
>
> *in STATA, by using streg:*
>
>  insheet using "dataSTATA.txt"
> (18 vars, 4862 obs)
>
> .
> . stset ntime, failure(s) id(id)
>                 id:  id
>      failure event:  s != 0 & s < .
> obs. time interval:  (ntime[_n-1], ntime]
>  exit on or before:  failure
> ------------------------------------------------------------------------------
>
>      4862  total obs.
>         0  exclusions
> ------------------------------------------------------------------------------
>
>      4862  obs. remaining, representing
>       351  subjects
>       283  failures in single failure-per-subject data
>      4862  total analysis time at risk, at risk from t =         0
>                              earliest observed entry t =         0
>                                   last observed exit t =        30
>
> .
> . gen f1 = 0
> . replace f1 = 1 if pcont==1
> (520 real changes made)
>
> .
> . gen f2 = 0
> . replace f2= 1 if pcont==2
> (1267 real changes made)
>
>
> . gen f3 = 0
> . replace f3= 1 if pcont==3
> (771 real changes made)
>
>
> . gen f4 = 0
> . replace f4= 1 if pcont==4
> (960 real changes made)
>
>
> . gen f5 = 0
> . replace f5= 1 if pcont==5
> (1344 real changes made)
>
>
> . streg  f1 f2 f3 f4 f5 a b c d g h j k l m, dist(weibull)  time nolog
>
>          failure _d:  s
>    analysis time _t:  ntime
>                  id:  id
> note: f5 dropped due to collinearity
>
> Weibull regression -- accelerated failure-time form
>
> No. of subjects =          228                     Number of obs   =
>  3129
> No. of failures =          190
> Time at risk    =         3129
>                                                    LR chi2(14)     =
>  226.94
> Log likelihood  =   -47.541886                     Prob > chi2     =
>  0.0000
>
> ------------------------------------------------------------------------------
>
>           _t |      Coef.   Std. Err.      z    P>|z|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------------
>
>           f1 |   .7194758   .2271213     3.17   0.002     .2743262
>  1.164625
>           f2 |   .4011278   .0604573     6.63   0.000     .2826336
> .519622
>           f3 |   .0142088   .0676573     0.21   0.834    -.1183972
>  .1468148
>           f4 |   .2984225   .0814102     3.67   0.000     .1388614
>  .4579835
>            a |   .0030444   .0024399     1.25   0.212    -.0017377
>  .0078265
>            b |  -.0008451   .0009987    -0.85   0.397    -.0028026
>  .0011124
>            c |   .0015207   .0009827     1.55   0.122    -.0004055
>  .0034468
>            d |   .0005143   .0007139     0.72   0.471    -.0008848
>  .0019135
>            g |   .0024349   .0025024     0.97   0.331    -.0024698
>  .0073395
>            h |  -.0024076   .0012125    -1.99   0.047    -.0047842
> -.0000311
>            j |    .013318   .0012565    10.60   0.000     .0108553
>  .0157806
>            k |    .002104   .0010438     2.02   0.044     .0000581
>  .0041498
>            l |  -.0017612   .0010071    -1.75   0.080     -.003735
>  .0002127
>            m |  -.0694492    .090727    -0.77   0.444    -.2472708
>  .1083724
>        _cons |   1.812124   .2000892     9.06   0.000     1.419956
>  2.204291
> -------------+----------------------------------------------------------------
>
>        /ln_p |   1.536971   .0756138    20.33   0.000      1.38877
>  1.685171
> -------------+----------------------------------------------------------------
>
>            p |   4.650481   .3516405                      4.009916
>  5.393373
>          1/p |   .2150315   .0162593                      .1854127
>  .2493818
> ------------------------------------------------------------------------------
>
>
> I do an accelerated failure time model with both programs. The problem of
> "survreg" is that it cannot handle time-dependent covariates:
> https://stat.ethz.ch/pipermail/r-help/2010-July/247216.html
>
>
>
> Thanks,
>
>
>
> On Mon, Jul 9, 2012 at 10:19 AM, Terry Therneau <[hidden 
> email]<http://user/SendEmail.jtp?type=node&node=4635888&i=0>>
> wrote:
>
> >  No input data, no output listed......   my crytal ball is too cloudy to
> > answer this.
> >
> > Terry T
> >
> >
> > On 07/09/2012 09:17 AM, Javier Palacios Fenech wrote:
> >
> > Please.
> >
> >  find an example here. With exactly the same data set, I run two hazard
> > models following the instructions for each function.
> >
> >  aftreg(formula = Surv(sta, sto, S) ~ a + b + c + d +   e + f + g
> > , factor(F), data = data.frame(SURV),
> >     dist = "weibull", id = ID)
> >
> >  streg  f1 f2 f3 f4 f5 a b c d g f g,  dist(weibull)   time nolog
> >  (note: F= f1, f2,f3,f4,f5)
> >
> >  Results are different. Really different. With aftreg some estimates are
> > significant, and with STATA they are not. Many estimates do not even
> have
> > the same sign, therefore predicting contrary effects. Which model should
> I
> > trust?
> >
> >  Best,
> >
> >  J
> >
> >
> >
> >
> >
> > On Mon, Jul 9, 2012 at 9:59 AM, Terry Therneau <[hidden 
> > email]<http://user/SendEmail.jtp?type=node&node=4635888&i=1>>
> wrote:
> >
> >> Without more information, we can only guess what you did, or what you
> are
> >> seeing on the page that is "different".
> >>
> >> I'll make a random guess though.  There are about 5 ways to
> paramaterize
> >> the Weibull distribution.  The standard packages that I know, however,
> tend
> >> to use the one found in the Kalbfleisch and Prentice book The
> Statistical
> >> Analysis of Failure time Data.  This includes the survreg funciton in R
> and
> >> lifereg in SAS, and likely stata tthought I don't know that package.
>  The
> >> aftreg function in the eha package uses something different.
> >>
> >> About 1/2 the weibull questions I see are due to a change in
> parameters.
> >>
> >> Terry T.
> >>
> >> ---- begin included message -----
> >>
> >>
> >>
> >>
> >> Dear Community,
> >>
> >> I have been using two types of survival programs to analyse a data set.
> >>
> >> The first one is an R function called aftreg. The second one an STATA
> >> function called streg.
> >>
> >> Both of them include the same analyisis with a weibull distribution.
> Yet,
> >> results are very different.
> >>
> >> Shouldn't the results be the same?
> >>
> >> Kind regards,
> >> J
> >>
> >>
> >
> >
> >  --
> > *Javier Palacios Fenech*
> > PhD in Management
> > Research Area: Marketing
> >  www.javierpalacios.com
> >
> >
> >
> >
>
> --
> *Javier Palacios Fenech*
> PhD in Management
> Research Area: Marketing
> www.javierpalacios.com
>
> ______________________________________________
> [hidden email] <http://user/SendEmail.jtp?type=node&node=4635888&i=2>mailing 
> list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
> *dataR.txt* (447K) Download 
> Attachment<http://r.789695.n4.nabble.com/attachment/4635888/0/dataR.txt>
> *dataSTATA.txt* (434K) Download 
> Attachment<http://r.789695.n4.nabble.com/attachment/4635888/1/dataSTATA.txt>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://r.789695.n4.nabble.com/differences-between-survival-models-between-STATA-and-R-tp4635670p4635888.html
>  To unsubscribe from differences between survival models between STATA and
> R, click 
> here<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4635670&code=anRoZXR6ZWxAZ21haWwuY29tfDQ2MzU2NzB8MjA1MzkzNjc1OQ==>
> .
> NAML<http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>


-----
Jeremy T. Hetzel
Boston University
--
View this message in context: 
http://r.789695.n4.nabble.com/differences-between-survival-models-between-STATA-and-R-tp4635670p4635917.html
Sent from the R help mailing list archive at Nabble.com.
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] differences between survival models between STATA and R

Reply via email to