On Fri, Jan 23, 2009 at 4:58 PM, Gang Chen <gangch...@gmail.com> wrote: > With the following example using contr.sum for both factors, > >> dd <- data.frame(a = gl(3,4), b = gl(4,1,12)) # balanced 2-way >> model.matrix(~ a * b, dd, contrasts = list(a="contr.sum", b="contr.sum")) > > (Intercept) a1 a2 b1 b2 b3 a1:b1 a2:b1 a1:b2 a2:b2 a1:b3 a2:b3 > 1 1 1 0 1 0 0 1 0 0 0 0 0 > 2 1 1 0 0 1 0 0 0 1 0 0 0 > 3 1 1 0 0 0 1 0 0 0 0 1 0 > 4 1 1 0 -1 -1 -1 -1 0 -1 0 -1 0 > 5 1 0 1 1 0 0 0 1 0 0 0 0 > 6 1 0 1 0 1 0 0 0 0 1 0 0 > 7 1 0 1 0 0 1 0 0 0 0 0 1 > 8 1 0 1 -1 -1 -1 0 -1 0 -1 0 -1 > 9 1 -1 -1 1 0 0 -1 -1 0 0 0 0 > 10 1 -1 -1 0 1 0 0 0 -1 -1 0 0 > 11 1 -1 -1 0 0 1 0 0 0 0 -1 -1 > 12 1 -1 -1 -1 -1 -1 1 1 1 1 1 1 > ...
> I have two questions: > (1) I assume the 1st column (under intercept) is the overall mean, the > 2rd column (under a1) is the difference between the 1st level of > factor a and the overall mean, the 4th column (under b1) is the > difference between the 1st level of factor b and the overall mean. > Is this interpretation correct? I don't think so and furthermore I don't see why the contrasts should have an interpretation. The contrasts are simply a parameterization of the space spanned by the indicator columns of the levels of the factors. Interpretations as overall means, etc. are mostly a holdover from antiquated concepts of how analysis of variance tables should be evalated. If you want to determine the interpretation of particular coefficients for the special case of a balanced design (which doesn't always mean a resulting balanced data set - I remind my students that expecting a balanced design to produce balanced data is contrary to Murphy's Law) the easiest way of doing so is (I think this is right but I can somehow manage to confuse myself on this with great ease) to calculate > contr.sum(3) [,1] [,2] 1 1 0 2 0 1 3 -1 -1 > solve(cbind(1, contr.sum(3))) 1 2 3 [1,] 0.3333333 0.3333333 0.3333333 [2,] 0.6666667 -0.3333333 -0.3333333 [3,] -0.3333333 0.6666667 -0.3333333 > solve(cbind(1, contr.sum(4))) 1 2 3 4 [1,] 0.25 0.25 0.25 0.25 [2,] 0.75 -0.25 -0.25 -0.25 [3,] -0.25 0.75 -0.25 -0.25 [4,] -0.25 -0.25 0.75 -0.25 That is, the first coefficient is the "overall mean" (but only for a balanced data set), the second is a contrast of the first level with the others, the third is a contrast of the second level with the others and so on. > (2) I'm not so sure about those interaction columns. For example, what > is a1:b1? Is it the 1st level of factor a at the 1st level of factor b > versus the overall mean, or something more complicated? Well, at the risk of sounding trivial, a1:b1 is the product of the a1 and b1 columns. You need a basis for a certain subspace and this provides one. I don't see why there must be interpretations of the coefficients. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.