Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1

Frank Harrell Sun, 14 Jun 2015 19:27:33 -0700

Terry - your example didn't demonstrate the problem because the variablethat interacted with strata (zed) was not a factor variable.

But I had stated the problem incorrectly. It's not that there are toomany strata terms; there are too many non-strata terms when the variableinteracting with the stratification factor is a factor variable. Hereis a simple example, where I have attached no packages other than thebasic startup packages.


strat <- function(x) x
d <- expand.grid(a=c('a1','a2'), b=c('b1','b2'))
d$y <- c(1,3,2,4)
f <- y ~ a * strat(b)
m <- model.frame(f, data=d)
Terms <- terms(f, specials='strat', data=d)
specials <- attr(Terms, 'specials')
temp <- survival:::untangle.specials(Terms, 'strat', 1)
Terms <- Terms[- temp$terms]
model.matrix(Terms, m)

  (Intercept) aa2 aa1:strat(b)b2 aa2:strat(b)b2
1           1   0              0              0
2           1   1              0              0
3           1   0              1              0
4           1   1              0              1
. . .

The column corresponding to a='a1' b='b2' should not be there(aa1:strat(b)b2).


This does seem to be a change in R.  Any help appreciated.

Note that after subsetting out strat terms using Terms[ - temp$terms],Terms attributes factor and term.labels are:


attr(,"factors")
         a a:strat(b)
y        0          0
a        1          2
strat(b) 0          1
attr(,"term.labels")
[1] "a"          "a:strat(b)"


Frank


On 06/11/2015 08:44 AM, Therneau, Terry M., Ph.D. wrote:

Frank,
   I'm not sure what is going on.  The following test function works for
me in both 3.1.1 and 3.2, i.e, the second model matrix has fewer
columns.  As I indicated to you earlier, the coxph code removes the
strata() columns after creating X because I found it easier to correctly
create the assign attribute.

   Can you create a worked example?

require(survival)
testfun <- function(formula, data) {
     tform <- terms(formula, specials="strata")
     mf <- model.frame(tform, data)

     terms2 <- terms(mf)
     strat <- untangle.specials(terms2, "strata")
     if (length(strat$terms)) terms2 <- terms2[-strat$terms]
     X <- model.matrix(terms2, mf)
     X
}

tdata <- data.frame(y= 1:10, zed = 1:10, grp =
factor(c(1,1,1,2,2,2,1,1,3,3)))

testfun(y ~ zed*grp, tdata)

testfun(y ~ strata(grp)*zed, tdata)


Terry T.

----- original message --

For building design matrices for Cox proportional hazards models in the
cph function in the rms package I have always used this construct:

Terms <- terms(formula, specials=c("strat", "cluster", "strata"),
data=data)
specials <- attr(Terms, 'specials')
stra    <- specials$strat
Terms.ns     <- Terms
      if(length(stra)) {
        temp <- untangle.specials(Terms.ns, "strat", 1)
        Terms.ns <- Terms.ns[- temp$terms]    #uses [.terms function
      }
X <- model.matrix(Terms.ns, X)[, -1, drop=FALSE]

The Terms.ns logic removes stratification factor "main effects" so that
if a stratification factor interacts with a non-stratification factor,
only the interaction terms are included, not the strat. factor main
effects. [In a Cox PH model stratification goes into the nonparametric
survival curve part of the model].

Lately this logic quit working; model.matrix keeps the unneeded main
effects in the design matrix.  Does anyone know what changed in R that
could have caused this, and possibly a workaround?


-------


--
------------------------------------------------------------------------
Frank E Harrell Jr              Professor and Chairman          School of 
Medicine
        Department of *Biostatistics*           *Vanderbilt University*

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1

Reply via email to