[R] passing constrasts=FALSE to contrast functions -- why does this exist?

2010-06-11 Thread Nathaniel Smith
Hello,

I've noticed that all contrast functions, like contr.treatment,
contr.poly, etc., take a logical argument called 'contrasts'. The
default is TRUE, in which case they do their normal thing of returning
a n x n-1 matrix whose columns are linearly-independent of the
intercept.

If contrasts=FALSE, they instead return an n x n matrix with full rank
(usually the identity matrix, corresponding to dummy coding, but
contr.poly returns orthogonal polynomials that include the zero-th
order constant term, instead of starting with the linear term as it
normally would).

Why does this argument exist?

My initial theory was that this was added to support the smart
handling of redundancy in model matrix construction -- depending on
what other terms exist in a formula, sometimes R will choose to
contrast code a factor in n-1 columns, and sometimes it will choose to
dummy code it in n columns. So it would make sense to call the
contrast function with contrasts=TRUE in the former case and
contrasts=FALSE in the latter case, and that way if the contrast
function for some reason wanted a full-rank coding *besides* dummy
coding then it could do that (like contr.poly).

But in fact, when R decides it wants dummy coding, it doesn't call the
contrast function, it just dummy codes unconditionally:

 a - factor(c(a, b, c))
 trace(contr.treatment)
 invisible(model.matrix(~ a))  # contrast coded
trace: ctrfn(levels(x), contrasts = contrasts)
 invisible(model.matrix(~ 0 + a)) # dummy coded


In fact, I can't find any code anywhere in R that ever uses contrasts=FALSE.

So what's going on? Is this a bug and R *should* be using
contrasts=FALSE to dummy code factors?

Confusedly yours,
-- Nathaniel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] The curious special case of ~ (a + b)/c

2010-06-05 Thread Nathaniel Smith
This isn't at all an urgent practical question, but recently while
exploring the details of how R formulas are interpreted, I learned of
this funny special case for how / interacts with +. In all of the
following cases, the multiplication-like operator simply distributes
over addition:
  (a + b):c = a:c + a:c
  a:(b + c) = a:b + a:c
  (a + b)*c = a*c + b*c
  a*(b + c) = a*b + a*c
  a/(b + c) = a/b + a/c

But:
  (a + b)/c = a + b + a:b:c, not a/c + b/c = a + a:c + b + b:c

Chambers and Hastie mention this, but give no explanation (page 29/30,
Slightly more subtle is...).

So for my own edification, does anyone know/care to speculate about
why (a + b)/c works this way?

-- Nathaniel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] The curious special case of ~ (a + b)/c

2010-06-05 Thread Nathaniel Smith
On Sat, Jun 5, 2010 at 2:01 PM, RICHARD M. HEIBERGER r...@temple.edu wrote:
 The / is used for nesting and is defined by
 A/B == A + (B %in% A)

 thus
 (a+b)/c == (a+b) + c %in% (a+b) == a + b + a:b:c

...I guess I could then ask why %in% is defined that way, but actually
this rephrasing somehow helped me figure it out :-). In case anyone
else with the same confusion finds this thread: the point in either
case is that a variable can't be nested in two other variables
separately, so the user must have meant it was nested in both
together.

-- Nathaniel

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.