On Tue, May 11, 2004 at 07:10:36AM -0700, Thomas Lumley wrote: > On Mon, 10 May 2004, Paul Johnson wrote: > > > Dear Everybody: > > > > I'm doing my usual "how does that work in R" thing with some Stata > > projects. I find a gross gap between the Stata and R in Cox PH models, > > and I hope you can give me some pointers about what goes wrong. I'm > > getting signals from R/Survival that the model just can't be estimated, > > but Stata spits out numbers just fine. > > > > I wonder if I should specify initial values for coxph? > > It's worth a try. The other question is whether Stata has in fact > converged -- if the range of rqe is not small then its coefficient may > actually be infinite. > > > I got a dataset from a student who uses Stata and try to replicate in R. > > I will share data to you in case you want to see for yourself. Let me > > know if you want text or Stata data file. > > I'd like to look at the data. We should be able to get coxph to converge > when there is a finite mle -- the log partial likelihood is concave.
Paul and Thomas, 'coxph' or the data (I got it from Paul) indeed has a problem: ------------------------------------------------------------ Call: coxph(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) coef exp(coef) se(coef) z p haz.wst 8.53e-08 1 9.47e-08 0.901 0.37 pol.free NA NA 0.00e+00 NA NA Likelihood ratio test=0.76 on 1 df, p=0.385 n= 21 Warning message: X matrix deemed to be singular; variable 2 in: coxph(Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) --------------------------------------------------------------- Is it the data? Let's try 'coxreg' (eha): --------------------------------------------------------------- Call: coxreg(formula = Surv(yrs2, ratify) ~ haz.wst + pol.free, data = dat) Covariate Mean Coef RR Wald p haz.wst 2054901 0.000 1.000 0.372 pol.free 2.090 0.009 1.009 0.958 Events 21 Total time at risk 78 Max. log. likelihood -45.001 LR test statistic 0.76 Degrees of freedom 2 Overall p-value 0.684583 ---------------------------------------------------------------- This worked just fine (Paul, same results as in Stata?). But, we seem to have a scaling problem; lok at the means of the covariates! Some rescaling gives: ---------------------------------------------------------------- Call: coxph(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, data = dat) coef exp(coef) se(coef) z p I(haz.wst * 1e-06) 0.08479 1.09 0.095 0.8920 0.37 pol.free 0.00896 1.01 0.170 0.0526 0.96 Likelihood ratio test=0.76 on 2 df, p=0.685 n= 21 ---------------------------------------------------------------- and now 'coxph' gets the same results as 'coxreg'. I don't know about coxph for sure, but I do know that coxreg centers all covariates before the NR procedure starts. Maybe we also should rescale to unit variance? And of course scale back the coefficients and se:s at the end? BTW, Paul's data is heavily tied. Could be an idea to use a discrete time version of the PH model: you can do that with 'mlreg' (eha): --------------------------------------------------------------- Call: mlreg(formula = Surv(yrs2, ratify) ~ I(haz.wst * 1e-06) + pol.free, data = dat) Covariate Mean Coef RR Wald p I(haz.wst * 1e-06) 2.055 0.097 1.102 0.320 pol.free 2.090 0.003 1.003 0.985 Events 21 Total time at risk 78 Max. log. likelihood -36.324 LR test statistic 0.93 Degrees of freedom 2 Overall p-value 0.627056 ---------------------------------------------------------------- Doesn't make much of a difference, though. Efron's method is quite robust. (You might check if there are any differences with 'breslow') Best, Göran [...] -- Göran Broström tel: +46 90 786 5223 Department of Statistics fax: +46 90 786 6614 Umeå University http://www.stat.umu.se/egna/gb/ SE-90187 Umeå, Sweden e-mail: [EMAIL PROTECTED] ______________________________________________ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html