Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1

2015-06-15 Thread Therneau, Terry M., Ph.D.

Frank,
  I don't think there is any way to fix your problem except the way that I 
did it.

library(survival)
tdata - data.frame(y=c(1,3,3,5, 5,7, 7,9, 9,13),
x1=factor(letters[c(1,1,1,1,1,2,2,2,2,2)]),
x2= c(1,2,1,2,1,2,1,2,1,2))

fit1 - lm( y ~ x1 * strata(x2) - strata(x2), tdata)
coef(fit1)
   (Intercept)x1b x1a:strata(x2)x2=2 x1b:strata(x2)x2=2
  3.00   5.00   1.00   1.67

Your code is calling model.matrix with the same model frame and terms structure as the lm 
call above (I checked).  In your case you know that the underlying model has 2 intercepts 
(strata), one for the group with x2=1 and another for the group with x2=2, but how is the 
model.matrix routine supposed to guess that?  It can't, so model.matrix returns the proper 
result for the lm call.  As seen above the result is not singular, while for the Cox model 
it is singular due to the extra intercept.


This is simply an extension of leaving the intercept term in the model and then removing 
that column from the returned X matrix, which is necessary to have the correct coding for 
ordinary factor variables, something we've both done since day 1.  In order for 
model.matrix to do the right thing with interactions, it has to know how many intercepts 
there actually are.


I've come to the conclusion that the entire thrust of 'contrasts' in S was wrong headed, 
i.e., the remove redundant columns from the X matrix ahead of time logic.  It is simply 
not possible for the model.matrix routine to guess correctly for all y and x combinations, 
something that been acknowledged in R by changing the default for singular.ok to TRUE. 
Dealing with this after the fact via a good contrast function (a la SAS -- heresy!) would 
have been a much better design choice.  But as long as I'm in R the coxph routine tries to 
be a good citizen.


Terry T.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1

2015-06-15 Thread Frank Harrell
Thank you very much Terry.  I'm still puzzled at why this worked a year 
ago.  What changed?  I'd very much like to reverse the change by setting 
an argument somewhere or manipulating the terms object.

I echo your sentiments about the general approach.

Frank


On 06/15/2015 09:05 AM, Therneau, Terry M., Ph.D. wrote:
 Frank,
   I don't think there is any way to fix your problem except the way 
 that I did it.

 library(survival)
 tdata - data.frame(y=c(1,3,3,5, 5,7, 7,9, 9,13),
 x1=factor(letters[c(1,1,1,1,1,2,2,2,2,2)]),
 x2= c(1,2,1,2,1,2,1,2,1,2))

 fit1 - lm( y ~ x1 * strata(x2) - strata(x2), tdata)
 coef(fit1)
(Intercept)x1b x1a:strata(x2)x2=2 
 x1b:strata(x2)x2=2
   3.00   5.00   1.00 1.67

 Your code is calling model.matrix with the same model frame and terms 
 structure as the lm call above (I checked).  In your case you know 
 that the underlying model has 2 intercepts (strata), one for the group 
 with x2=1 and another for the group with x2=2, but how is the 
 model.matrix routine supposed to guess that?  It can't, so 
 model.matrix returns the proper result for the lm call.  As seen above 
 the result is not singular, while for the Cox model it is singular due 
 to the extra intercept.

 This is simply an extension of leaving the intercept term in the 
 model and then removing that column from the returned X matrix, which 
 is necessary to have the correct coding for ordinary factor variables, 
 something we've both done since day 1.  In order for model.matrix to 
 do the right thing with interactions, it has to know how many 
 intercepts there actually are.

 I've come to the conclusion that the entire thrust of 'contrasts' in S 
 was wrong headed, i.e., the remove redundant columns from the X 
 matrix ahead of time logic.  It is simply not possible for the 
 model.matrix routine to guess correctly for all y and x combinations, 
 something that been acknowledged in R by changing the default for 
 singular.ok to TRUE. Dealing with this after the fact via a good 
 contrast function (a la SAS -- heresy!) would have been a much better 
 design choice.  But as long as I'm in R the coxph routine tries to be 
 a good citizen.

 Terry T.

-- 

Frank E Harrell Jr  Professor and Chairman  School of Medicine

Department of *Biostatistics*   *Vanderbilt University*


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1

2015-06-14 Thread Frank Harrell
Terry - your example didn't demonstrate the problem because the variable 
that interacted with strata (zed) was not a factor variable.


But I had stated the problem incorrectly.  It's not that there are too 
many strata terms; there are too many non-strata terms when the variable 
interacting with the stratification factor is a factor variable.  Here 
is a simple example, where I have attached no packages other than the 
basic startup packages.


strat - function(x) x
d - expand.grid(a=c('a1','a2'), b=c('b1','b2'))
d$y - c(1,3,2,4)
f - y ~ a * strat(b)
m - model.frame(f, data=d)
Terms - terms(f, specials='strat', data=d)
specials - attr(Terms, 'specials')
temp - survival:::untangle.specials(Terms, 'strat', 1)
Terms - Terms[- temp$terms]
model.matrix(Terms, m)

  (Intercept) aa2 aa1:strat(b)b2 aa2:strat(b)b2
1   1   0  0  0
2   1   1  0  0
3   1   0  1  0
4   1   1  0  1
. . .

The column corresponding to a='a1' b='b2' should not be there 
(aa1:strat(b)b2).


This does seem to be a change in R.  Any help appreciated.

Note that after subsetting out strat terms using Terms[ - temp$terms], 
Terms attributes factor and term.labels are:


attr(,factors)
 a a:strat(b)
y0  0
a1  2
strat(b) 0  1
attr(,term.labels)
[1] a  a:strat(b)


Frank


On 06/11/2015 08:44 AM, Therneau, Terry M., Ph.D. wrote:

Frank,
   I'm not sure what is going on.  The following test function works for
me in both 3.1.1 and 3.2, i.e, the second model matrix has fewer
columns.  As I indicated to you earlier, the coxph code removes the
strata() columns after creating X because I found it easier to correctly
create the assign attribute.

   Can you create a worked example?

require(survival)
testfun - function(formula, data) {
 tform - terms(formula, specials=strata)
 mf - model.frame(tform, data)

 terms2 - terms(mf)
 strat - untangle.specials(terms2, strata)
 if (length(strat$terms)) terms2 - terms2[-strat$terms]
 X - model.matrix(terms2, mf)
 X
}

tdata - data.frame(y= 1:10, zed = 1:10, grp =
factor(c(1,1,1,2,2,2,1,1,3,3)))

testfun(y ~ zed*grp, tdata)

testfun(y ~ strata(grp)*zed, tdata)


Terry T.

- original message --

For building design matrices for Cox proportional hazards models in the
cph function in the rms package I have always used this construct:

Terms - terms(formula, specials=c(strat, cluster, strata),
data=data)
specials - attr(Terms, 'specials')
stra- specials$strat
Terms.ns - Terms
  if(length(stra)) {
temp - untangle.specials(Terms.ns, strat, 1)
Terms.ns - Terms.ns[- temp$terms]#uses [.terms function
  }
X - model.matrix(Terms.ns, X)[, -1, drop=FALSE]

The Terms.ns logic removes stratification factor main effects so that
if a stratification factor interacts with a non-stratification factor,
only the interaction terms are included, not the strat. factor main
effects. [In a Cox PH model stratification goes into the nonparametric
survival curve part of the model].

Lately this logic quit working; model.matrix keeps the unneeded main
effects in the design matrix.  Does anyone know what changed in R that
could have caused this, and possibly a workaround?


---




--

Frank E Harrell Jr  Professor and Chairman  School of 
Medicine
Department of *Biostatistics*   *Vanderbilt University*

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Different behavior of model.matrix between R 3.2 and R3.1.1

2015-06-11 Thread Therneau, Terry M., Ph.D.

Frank,
  I'm not sure what is going on.  The following test function works for me in both 3.1.1 
and 3.2, i.e, the second model matrix has fewer columns.  As I indicated to you earlier, 
the coxph code removes the strata() columns after creating X because I found it easier to 
correctly create the assign attribute.


  Can you create a worked example?

require(survival)
testfun - function(formula, data) {
tform - terms(formula, specials=strata)
mf - model.frame(tform, data)

terms2 - terms(mf)
strat - untangle.specials(terms2, strata)
if (length(strat$terms)) terms2 - terms2[-strat$terms]
X - model.matrix(terms2, mf)
X
}

tdata - data.frame(y= 1:10, zed = 1:10, grp = factor(c(1,1,1,2,2,2,1,1,3,3)))

testfun(y ~ zed*grp, tdata)

testfun(y ~ strata(grp)*zed, tdata)


Terry T.

- original message --

For building design matrices for Cox proportional hazards models in the
cph function in the rms package I have always used this construct:

Terms - terms(formula, specials=c(strat, cluster, strata), data=data)
specials - attr(Terms, 'specials')
stra- specials$strat
Terms.ns - Terms
 if(length(stra)) {
   temp - untangle.specials(Terms.ns, strat, 1)
   Terms.ns - Terms.ns[- temp$terms]#uses [.terms function
 }
X - model.matrix(Terms.ns, X)[, -1, drop=FALSE]

The Terms.ns logic removes stratification factor main effects so that
if a stratification factor interacts with a non-stratification factor,
only the interaction terms are included, not the strat. factor main
effects. [In a Cox PH model stratification goes into the nonparametric
survival curve part of the model].

Lately this logic quit working; model.matrix keeps the unneeded main
effects in the design matrix.  Does anyone know what changed in R that
could have caused this, and possibly a workaround?


---

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.