Re: [R] Why does the order of terms in a formula translate into different models/ model matrices?

peter dalgaard Tue, 01 May 2012 06:04:44 -0700

I know this is three months old, but I had put it on a mental TODO list to see 
whether it was something that could be fixed, and I finally got a little spare 
time for that sort of thing. It turns out that it can NOT be fixed (without 
fundamental design changes), so I thought I'd better record the conclusion for 
the archives.


The reason is pretty simple: It is terms.formula that decides which terms 
require contrast coding and which need dummy coding. It does this based on the 
formula structure _only_; i.e., it has no information on whether variables are 
factors or numerical. E.g., you can do

> terms(~ a:x + a:b)
~a:x + a:b
attr(,"variables")
list(a, x, b)
attr(,"factors")
  a:x a:b
a   2   2
x   2   0
b   0   1
attr(,"term.labels")
[1] "a:x" "a:b"
attr(,"order")
[1] 2 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 0
attr(,".Environment")
<environment: R_GlobalEnv>

in which there is no indication to terms() what kind of variable "x" is. So if 
you want different behavior if x is a factor than if it is continuous, you're 
out of luck...

Not quite sure the last three lines of my note below make sense, though.

-pd

On Jan 29, 2012, at 11:24 , peter dalgaard wrote:

> 
> On Jan 29, 2012, at 02:42 , Ben Bolker wrote:
>>> 
>> 
>> (quoting removed to make Gmane happy)
>> 
>> AFAICS, this is a bug.
>> 
>> 
>> I think so too, although I haven't got my head around it yet.
>> 
>> Chuck, are you willing to post a summary of this to r-devel
>> for discussion ... and/or post a bug report?
> 
> You have to be very specific about what the bug is, if any... I.e., which 
> precisely are the rules that are broken by the current behavior? 
> 
> Preferably also suggest a fix --- the corner cases of model.matrix and 
> friends is some of the more impenetrable code in the R sources.
> 
> Notice that order dependent parametrization of terms is not a bug per se, nor 
> is the automatic switch to dummy coding of factors. Consider these cases:
> 
> d <- cbind(expand.grid(a=c("a","b"),b=c("X","Y"),c=c("U","W")),x=1:8)
> model.matrix(~ a:b + a:c, d)
> model.matrix(~ a:c + a:b, d)
> model.matrix(~ a:b + a:x, d)
> model.matrix(~ a:x + a:b, d)
> 
> and notice that the logic applying to "x" is the same as that applying to 
> "c". 
> 
> The crux seems to be that the model ~a:c contains the model ~a whereas ~a:x 
> does not, and hence the rationale for _not_ expanding a subsequent a:b term 
> to dummies (namely that ~a is "already in" the model) fails. 
> 
> -- 
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd....@cbs.dk  Priv: pda...@gmail.com
> 
> 
> 
> 
> 
> 
> 
> 

-- 
Peter Dalgaard, Professor
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Why does the order of terms in a formula translate into different models/ model matrices?

Reply via email to