Hello,

I've noticed that all contrast functions, like contr.treatment,
contr.poly, etc., take a logical argument called 'contrasts'. The
default is TRUE, in which case they do their normal thing of returning
a n x n-1 matrix whose columns are linearly-independent of the
intercept.

If contrasts=FALSE, they instead return an n x n matrix with full rank
(usually the identity matrix, corresponding to "dummy" coding, but
contr.poly returns orthogonal polynomials that include the zero-th
order constant term, instead of starting with the linear term as it
normally would).

Why does this argument exist?

My initial theory was that this was added to support the smart
handling of redundancy in model matrix construction -- depending on
what other terms exist in a formula, sometimes R will choose to
contrast code a factor in n-1 columns, and sometimes it will choose to
dummy code it in n columns. So it would make sense to call the
contrast function with contrasts=TRUE in the former case and
contrasts=FALSE in the latter case, and that way if the contrast
function for some reason wanted a full-rank coding *besides* dummy
coding then it could do that (like contr.poly).

But in fact, when R decides it wants dummy coding, it doesn't call the
contrast function, it just dummy codes unconditionally:

> a <- factor(c("a", "b", "c"))
> trace(contr.treatment)
> invisible(model.matrix(~ a))  # contrast coded
trace: ctrfn(levels(x), contrasts = contrasts)
> invisible(model.matrix(~ 0 + a)) # dummy coded
>

In fact, I can't find any code anywhere in R that ever uses contrasts=FALSE.

So what's going on? Is this a bug and R *should* be using
contrasts=FALSE to "dummy code" factors?

Confusedly yours,
-- Nathaniel

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to