Hi Arie, The book out of which this behavior is based does not use factor (in this section) to refer to categorical factor. I will again point to this sentence, from page 40, in the same section and referring to the behavior under question, that shows F_j is not limited to categorical factors: "Numeric variables appear in the computations as themselves, uncoded. Therefore, the rule does not do anything special for them, and it remains valid, in a trivial sense, whenever any of the F_j is numeric rather than categorical."
Note the "... whenever any of the F_j is numeric rather than categorical." Factor here is used in the more general sense of the word, not referring to the R type "factor." The behavior of R does not match the heuristic that it's citing. Best regards, Tyler On Thu, Nov 2, 2017 at 2:51 AM, Arie ten Cate <arietenc...@gmail.com> wrote: > Hello Tyler, > > Thank you for searching for, and finding, the basic description of the > behavior of R in this matter. > > I think your example is in agreement with the book. > > But let me first note the following. You write: "F_j refers to a > factor (variable) in a model and not a categorical factor". However: > "a factor is a vector object used to specify a discrete > classification" (start of chapter 4 of "An Introduction to R".) You > might also see the description of the R function factor(). > > You note that the book says about a factor F_j: > "... F_j is coded by contrasts if T_{i(j)} has appeared in the > formula and by dummy variables if it has not" > > You find: > "However, the example I gave demonstrated that this dummy variable > encoding only occurs for the model where the missing term is the > numeric-numeric interaction, ~(X1+X2+X3)^3-X1:X2." > > We have here T_i = X1:X2:X3. Also: F_j = X3 (the only factor). Then > T_{i(j)} = X1:X2, which is dropped from the model. Hence the X3 in T_i > must be encoded by dummy variables, as indeed it is. > > Arie > > On Tue, Oct 31, 2017 at 4:01 PM, Tyler <tyle...@gmail.com> wrote: > > Hi Arie, > > > > Thank you for your further research into the issue. > > > > Regarding Stata: On the other hand, JMP gives model matrices that use the > > main effects contrasts in computing the higher order interactions, > without > > the dummy variable encoding. I verified this both by analyzing the linear > > model given in my first example and noting that JMP has one more degree > of > > freedom than R for the same model, as well as looking at the generated > model > > matrices. It's easy to find a design where JMP will allow us fit our > model > > with goodness-of-fit estimates and R will not due to the extra degree(s) > of > > freedom required. Let's keep the conversation limited to R. > > > > I want to refocus back onto my original bug report, which was not for a > > missing main effects term, but rather for a missing lower-order > interaction > > term. The behavior of model.matrix.default() for a missing main effects > term > > is a nice example to demonstrate how model.matrix encodes with dummy > > variables instead of contrasts, but doesn't demonstrate the inconsistent > > behavior my bug report highlighted. > > > > I went looking for documentation on this behavior, and the issue stems > not > > from model.matrix.default(), but rather the terms() function in > interpreting > > the formula. This "clever" replacement of contrasts by dummy variables to > > maintain marginality (presuming that's the reason) is not described > anywhere > > in the documentation for either the model.matrix() or the terms() > function. > > In order to find a description for the behavior, I had to look in the > > underlying C code, buried above the "TermCode" function of the "model.c" > > file, which says: > > > > "TermCode decides on the encoding of a model term. Returns 1 if variable > > ``whichBit'' in ``thisTerm'' is to be encoded by contrasts and 2 if it > is to > > be encoded by dummy variables. This is decided using the heuristic > > described in Statistical Models in S, page 38." > > > > I do not have a copy of this book, and I suspect most R users do not as > > well. Thankfully, however, some of the pages describing this behavior > were > > available as part of Amazon's "Look Inside" feature--but if not for > that, I > > would have no idea what heuristic R was using. Since those pages could > made > > unavailable by Amazon at any time, at the very least we have an problem > with > > a lack of documentation. > > > > However, I still believe there is a bug when comparing R's > implementation to > > the heuristic described in the book. From Statistical Models in S, page > > 38-39: > > > > "Suppose F_j is any factor included in term T_i. Let T_{i(j)} denote the > > margin of T_i for factor F_j--that is, the term obtained by dropping F_j > > from T_i. We say that T_{i(j)} has appeared in the formula if there is > some > > term T_i' for i' < i such that T_i' contains all the factors appearing in > > T_{i(j)}. The usual case is that T_{i(j)} itself is one of the preceding > > terms. Then F_j is coded by contrasts if T_{i(j)} has appeared in the > > formula and by dummy variables if it has not" > > > > Here, F_j refers to a factor (variable) in a model and not a categorical > > factor, as specified later in that section (page 40): "Numeric variables > > appear in the computations as themselves, uncoded. Therefore, the rule > does > > not do anything special for them, and it remains valid, in a trivial > sense, > > whenever any of the F_j is numeric rather than categorical." > > > > Going back to my original example with three variables: X1 (numeric), X2 > > (numeric), X3 (categorical). This heuristic prescribes encoding X1:X2:X3 > > with contrasts as long as X1:X2, X1:X3, and X2:X3 exist in the formula. > When > > any of the preceding terms do not exist, this heuristic tells us to use > > dummy variables to encode the interaction (e.g. "F_j [the interaction > term] > > is coded ... by dummy variables if it [any of the marginal terms > obtained by > > dropping a single factor in the interaction] has not [appeared in the > > formula]"). However, the example I gave demonstrated that this dummy > > variable encoding only occurs for the model where the missing term is the > > numeric-numeric interaction, "~(X1+X2+X3)^3-X1:X2". Otherwise, the > > interaction term X1:X2:X3 is encoded by contrasts, not dummy variables. > This > > is inconsistent with the description of the intended behavior given in > the > > book. > > > > Best regards, > > Tyler > > > > > > On Fri, Oct 27, 2017 at 2:18 PM, Arie ten Cate <arietenc...@gmail.com> > > wrote: > >> > >> Hello Tyler, > >> > >> I want to bring to your attention the following document: "What > >> happens if you omit the main effect in a regression model with an > >> interaction?" > >> (https://stats.idre.ucla.edu/stata/faq/what-happens-if-you- > omit-the-main-effect-in-a-regression-model-with-an-interaction). > >> This gives a useful review of the problem. Your example is Case 2: a > >> continuous and a categorical regressor. > >> > >> The numerical examples are coded in Stata, and they give the same > >> result as in R. Hence, if this is a bug in R then it is also a bug in > >> Stata. That seems very unlikely. > >> > >> Here is a simulation in R of the above mentioned Case 2 in Stata: > >> > >> df <- expand.grid(socst=c(-1:1),grp=c("1","2","3","4")) > >> print("Full model") > >> print(model.matrix(~(socst+grp)^2 ,data=df)) > >> print("Example 2.1: drop socst") > >> print(model.matrix(~(socst+grp)^2 -socst ,data=df)) > >> print("Example 2.2: drop grp") > >> print(model.matrix(~(socst+grp)^2 -grp ,data=df)) > >> > >> This gives indeed the following regressors: > >> > >> "Full model" > >> (Intercept) socst grp2 grp3 grp4 socst:grp2 socst:grp3 socst:grp4 > >> "Example 2.1: drop socst" > >> (Intercept) grp2 grp3 grp4 socst:grp1 socst:grp2 socst:grp3 socst:grp4 > >> "Example 2.2: drop grp" > >> (Intercept) socst socst:grp2 socst:grp3 socst:grp4 > >> > >> There is a little bit of R documentation about this, based on the > >> concept of marginality, which typically forbids a model having an > >> interaction but not the corresponding main effects. (You might see the > >> references in https://en.wikipedia.org/wiki/Principle_of_marginality ) > >> See "An Introduction to R", by Venables and Smith and the R Core > >> Team. At the bottom of page 52 (PDF: 57) it says: "Although the > >> details are complicated, model formulae in R will normally generate > >> the models that an expert statistician would expect, provided that > >> marginality is preserved. Fitting, for [a contrary] example, a model > >> with an interaction but not the corresponding main effects will in > >> general lead to surprising results ....". > >> The Reference Manual states that the R functions dropterm() and > >> addterm() resp. drop or add only terms such that marginality is > >> preserved. > >> > >> Finally, about your singular matrix t(mm)%*%mm. This is in fact > >> Example 2.1 in Case 2 discussed above. As discussed there, in Stata > >> and in R the drop of the continuous variable has no effect on the > >> degrees of freedom here: it is just a reparameterisation of the full > >> model, protecting you against losing marginality... Hence the > >> model.matrix 'mm' is still square and nonsingular after the drop of > >> X1, unless of course when a row is removed from the matrix 'design' > >> when before creating 'mm'. > >> > >> Arie > >> > >> On Sun, Oct 15, 2017 at 7:05 PM, Tyler <tyle...@gmail.com> wrote: > >> > You could possibly try to explain away the behavior for a missing main > >> > effects term, since without the main effects term we don't have main > >> > effect > >> > columns in the model matrix used to compute the interaction columns > (At > >> > best this is undocumented behavior--I still think it's a bug, as we > know > >> > how we would encode the categorical factors if they were in fact > >> > present. > >> > It's either specified in contrasts.arg or using the default set in > >> > options). However, when all the main effects are present, why would > the > >> > three-factor interaction column not simply be the product of the main > >> > effect columns? In my example: we know X1, we know X2, and we know X3. > >> > Why > >> > does the encoding of X1:X2:X3 depend on whether we specified a > >> > two-factor > >> > interaction, AND only changes for specific missing interactions? > >> > > >> > In addition, I can use a two-term example similar to yours to show how > >> > this > >> > behavior results in a singular covariance matrix when, given the > desired > >> > factor encoding, it should not be singular. > >> > > >> > We start with a full factorial design for a two-level continuous > factor > >> > and > >> > a three-level categorical factor, and remove a single row. This design > >> > matrix does not leave enough degrees of freedom to determine > >> > goodness-of-fit, but should allow us to obtain parameter estimates. > >> > > >> >> design = expand.grid(X1=c(1,-1),X2=c("A","B","C")) > >> >> design = design[-1,] > >> >> design > >> > X1 X2 > >> > 2 -1 A > >> > 3 1 B > >> > 4 -1 B > >> > 5 1 C > >> > 6 -1 C > >> > > >> > Here, we first calculate the model matrix for the full model, and then > >> > manually remove the X1 column from the model matrix. This gives us the > >> > model matrix one would expect if X1 were removed from the model. We > then > >> > successfully calculate the covariance matrix. > >> > > >> >> mm = model.matrix(~(X1+X2)^2,data=design) > >> >> mm > >> > (Intercept) X1 X2B X2C X1:X2B X1:X2C > >> > 2 1 -1 0 0 0 0 > >> > 3 1 1 1 0 1 0 > >> > 4 1 -1 1 0 -1 0 > >> > 5 1 1 0 1 0 1 > >> > 6 1 -1 0 1 0 -1 > >> > > >> >> mm = mm[,-2] > >> >> solve(t(mm) %*% mm) > >> > (Intercept) X2B X2C X1:X2B X1:X2C > >> > (Intercept) 1 -1.0 -1.0 0.0 0.0 > >> > X2B -1 1.5 1.0 0.0 0.0 > >> > X2C -1 1.0 1.5 0.0 0.0 > >> > X1:X2B 0 0.0 0.0 0.5 0.0 > >> > X1:X2C 0 0.0 0.0 0.0 0.5 > >> > > >> > Here, we see the actual behavior for model.matrix. The undesired > >> > re-coding > >> > of the model matrix interaction term makes the information matrix > >> > singular. > >> > > >> >> mm = model.matrix(~(X1+X2)^2-X1,data=design) > >> >> mm > >> > (Intercept) X2B X2C X1:X2A X1:X2B X1:X2C > >> > 2 1 0 0 -1 0 0 > >> > 3 1 1 0 0 1 0 > >> > 4 1 1 0 0 -1 0 > >> > 5 1 0 1 0 0 1 > >> > 6 1 0 1 0 0 -1 > >> > > >> >> solve(t(mm) %*% mm) > >> > Error in solve.default(t(mm) %*% mm) : system is computationally > >> > singular: > >> > reciprocal condition number = 5.55112e-18 > >> > > >> > I still believe this is a bug. > >> > > >> > Best regards, > >> > Tyler Morgan-Wall > >> > > >> > On Sun, Oct 15, 2017 at 1:49 AM, Arie ten Cate <arietenc...@gmail.com > > > >> > wrote: > >> > > >> >> I think it is not a bug. It is a general property of interactions. > >> >> This property is best observed if all variables are factors > >> >> (qualitative). > >> >> > >> >> For example, you have three variables (factors). You ask for as many > >> >> interactions as possible, except an interaction term between two > >> >> particular variables. When this interaction is not a constant, it is > >> >> different for different values of the remaining variable. More > >> >> precisely: for all values of that variable. In other words: you have > a > >> >> three-way interaction, with all values of that variable. > >> >> > >> >> An even smaller example is the following script with only two > >> >> variables, each being a factor: > >> >> > >> >> df <- expand.grid(X1=c("p","q"), X2=c("A","B","C")) > >> >> print(model.matrix(~(X1+X2)^2 ,data=df)) > >> >> print(model.matrix(~(X1+X2)^2 -X1,data=df)) > >> >> print(model.matrix(~(X1+X2)^2 -X2,data=df)) > >> >> > >> >> The result is: > >> >> > >> >> (Intercept) X1q X2B X2C X1q:X2B X1q:X2C > >> >> 1 1 0 0 0 0 0 > >> >> 2 1 1 0 0 0 0 > >> >> 3 1 0 1 0 0 0 > >> >> 4 1 1 1 0 1 0 > >> >> 5 1 0 0 1 0 0 > >> >> 6 1 1 0 1 0 1 > >> >> > >> >> (Intercept) X2B X2C X1q:X2A X1q:X2B X1q:X2C > >> >> 1 1 0 0 0 0 0 > >> >> 2 1 0 0 1 0 0 > >> >> 3 1 1 0 0 0 0 > >> >> 4 1 1 0 0 1 0 > >> >> 5 1 0 1 0 0 0 > >> >> 6 1 0 1 0 0 1 > >> >> > >> >> (Intercept) X1q X1p:X2B X1q:X2B X1p:X2C X1q:X2C > >> >> 1 1 0 0 0 0 0 > >> >> 2 1 1 0 0 0 0 > >> >> 3 1 0 1 0 0 0 > >> >> 4 1 1 0 1 0 0 > >> >> 5 1 0 0 0 1 0 > >> >> 6 1 1 0 0 0 1 > >> >> > >> >> Thus, in the second result, we have no main effect of X1. Instead, > the > >> >> effect of X1 depends on the value of X2; either A or B or C. In fact, > >> >> this is a two-way interaction, including all three values of X2. In > >> >> the third result, we have no main effect of X2, The effect of X2 > >> >> depends on the value of X1; either p or q. > >> >> > >> >> A complicating element with your example seems to be that your X1 and > >> >> X2 are not factors. > >> >> > >> >> Arie > >> >> > >> >> On Thu, Oct 12, 2017 at 7:12 PM, Tyler <tyle...@gmail.com> wrote: > >> >> > Hi, > >> >> > > >> >> > I recently ran into an inconsistency in the way > model.matrix.default > >> >> > handles factor encoding for higher level interactions with > >> >> > categorical > >> >> > variables when the full hierarchy of effects is not present. > >> >> > Depending on > >> >> > which lower level interactions are specified, the factor encoding > >> >> > changes > >> >> > for a higher level interaction. Consider the following minimal > >> >> reproducible > >> >> > example: > >> >> > > >> >> > -------------- > >> >> > > >> >> >> runmatrix = expand.grid(X1=c(1,-1),X2=c(1,-1),X3=c("A","B","C"))> > >> >> model.matrix(~(X1+X2+X3)^3,data=runmatrix) (Intercept) X1 X2 X3B > X3C > >> >> X1:X2 X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> >> > 1 1 1 1 0 0 1 0 0 0 0 > >> >> > 0 0 > >> >> > 2 1 -1 1 0 0 -1 0 0 0 0 > >> >> > 0 0 > >> >> > 3 1 1 -1 0 0 -1 0 0 0 0 > >> >> > 0 0 > >> >> > 4 1 -1 -1 0 0 1 0 0 0 0 > >> >> > 0 0 > >> >> > 5 1 1 1 1 0 1 1 0 1 0 > >> >> > 1 0 > >> >> > 6 1 -1 1 1 0 -1 -1 0 1 0 > >> >> > -1 0 > >> >> > 7 1 1 -1 1 0 -1 1 0 -1 0 > >> >> > -1 0 > >> >> > 8 1 -1 -1 1 0 1 -1 0 -1 0 > >> >> > 1 0 > >> >> > 9 1 1 1 0 1 1 0 1 0 1 > >> >> > 0 1 > >> >> > 10 1 -1 1 0 1 -1 0 -1 0 1 > >> >> > 0 -1 > >> >> > 11 1 1 -1 0 1 -1 0 1 0 -1 > >> >> > 0 -1 > >> >> > 12 1 -1 -1 0 1 1 0 -1 0 -1 > >> >> > 0 1 > >> >> > attr(,"assign") > >> >> > [1] 0 1 2 3 3 4 5 5 6 6 7 7 > >> >> > attr(,"contrasts") > >> >> > attr(,"contrasts")$X3 > >> >> > [1] "contr.treatment" > >> >> > > >> >> > -------------- > >> >> > > >> >> > Specifying the full hierarchy gives us what we expect: the > >> >> > interaction > >> >> > columns are simply calculated from the product of the main effect > >> >> columns. > >> >> > The interaction term X1:X2:X3 gives us two columns in the model > >> >> > matrix, > >> >> > X1:X2:X3B and X1:X2:X3C, matching the products of the main effects. > >> >> > > >> >> > If we remove either the X2:X3 interaction or the X1:X3 interaction, > >> >> > we > >> >> get > >> >> > what we would expect for the X1:X2:X3 interaction, but when we > remove > >> >> > the > >> >> > X1:X2 interaction the encoding for X1:X2:X3 changes completely: > >> >> > > >> >> > -------------- > >> >> > > >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X3,data=runmatrix) (Intercept) > X1 X2 > >> >> X3B X3C X1:X2 X2:X3B X2:X3C X1:X2:X3B X1:X2:X3C > >> >> > 1 1 1 1 0 0 1 0 0 0 > 0 > >> >> > 2 1 -1 1 0 0 -1 0 0 0 > 0 > >> >> > 3 1 1 -1 0 0 -1 0 0 0 > 0 > >> >> > 4 1 -1 -1 0 0 1 0 0 0 > 0 > >> >> > 5 1 1 1 1 0 1 1 0 1 > 0 > >> >> > 6 1 -1 1 1 0 -1 1 0 -1 > 0 > >> >> > 7 1 1 -1 1 0 -1 -1 0 -1 > 0 > >> >> > 8 1 -1 -1 1 0 1 -1 0 1 > 0 > >> >> > 9 1 1 1 0 1 1 0 1 0 > 1 > >> >> > 10 1 -1 1 0 1 -1 0 1 0 > -1 > >> >> > 11 1 1 -1 0 1 -1 0 -1 0 > -1 > >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > 1 > >> >> > attr(,"assign") > >> >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> >> > attr(,"contrasts") > >> >> > attr(,"contrasts")$X3 > >> >> > [1] "contr.treatment" > >> >> > > >> >> > > >> >> > > >> >> >> model.matrix(~(X1+X2+X3)^3-X2:X3,data=runmatrix) (Intercept) > X1 X2 > >> >> X3B X3C X1:X2 X1:X3B X1:X3C X1:X2:X3B X1:X2:X3C > >> >> > 1 1 1 1 0 0 1 0 0 0 > 0 > >> >> > 2 1 -1 1 0 0 -1 0 0 0 > 0 > >> >> > 3 1 1 -1 0 0 -1 0 0 0 > 0 > >> >> > 4 1 -1 -1 0 0 1 0 0 0 > 0 > >> >> > 5 1 1 1 1 0 1 1 0 1 > 0 > >> >> > 6 1 -1 1 1 0 -1 -1 0 -1 > 0 > >> >> > 7 1 1 -1 1 0 -1 1 0 -1 > 0 > >> >> > 8 1 -1 -1 1 0 1 -1 0 1 > 0 > >> >> > 9 1 1 1 0 1 1 0 1 0 > 1 > >> >> > 10 1 -1 1 0 1 -1 0 -1 0 > -1 > >> >> > 11 1 1 -1 0 1 -1 0 1 0 > -1 > >> >> > 12 1 -1 -1 0 1 1 0 -1 0 > 1 > >> >> > attr(,"assign") > >> >> > [1] 0 1 2 3 3 4 5 5 6 6 > >> >> > attr(,"contrasts") > >> >> > attr(,"contrasts")$X3 > >> >> > [1] "contr.treatment" > >> >> > > >> >> > > >> >> >> model.matrix(~(X1+X2+X3)^3-X1:X2,data=runmatrix) (Intercept) > X1 X2 > >> >> X3B X3C X1:X3B X1:X3C X2:X3B X2:X3C X1:X2:X3A X1:X2:X3B X1:X2:X3C > >> >> > 1 1 1 1 0 0 0 0 0 0 1 > >> >> > 0 0 > >> >> > 2 1 -1 1 0 0 0 0 0 0 -1 > >> >> > 0 0 > >> >> > 3 1 1 -1 0 0 0 0 0 0 -1 > >> >> > 0 0 > >> >> > 4 1 -1 -1 0 0 0 0 0 0 1 > >> >> > 0 0 > >> >> > 5 1 1 1 1 0 1 0 1 0 0 > >> >> > 1 0 > >> >> > 6 1 -1 1 1 0 -1 0 1 0 0 > >> >> > -1 0 > >> >> > 7 1 1 -1 1 0 1 0 -1 0 0 > >> >> > -1 0 > >> >> > 8 1 -1 -1 1 0 -1 0 -1 0 0 > >> >> > 1 0 > >> >> > 9 1 1 1 0 1 0 1 0 1 0 > >> >> > 0 1 > >> >> > 10 1 -1 1 0 1 0 -1 0 1 0 > >> >> > 0 -1 > >> >> > 11 1 1 -1 0 1 0 1 0 -1 0 > >> >> > 0 -1 > >> >> > 12 1 -1 -1 0 1 0 -1 0 -1 0 > >> >> > 0 1 > >> >> > attr(,"assign") > >> >> > [1] 0 1 2 3 3 4 4 5 5 6 6 6 > >> >> > attr(,"contrasts") > >> >> > attr(,"contrasts")$X3 > >> >> > [1] "contr.treatment" > >> >> > > >> >> > -------------- > >> >> > > >> >> > Here, we now see the encoding for the interaction X1:X2:X3 is now > the > >> >> > interaction of X1 and X2 with a new encoding for X3 where each > factor > >> >> level > >> >> > is represented by its own column. I would expect, given the two > >> >> > column > >> >> > dummy variable encoding for X3, that the X1:X2:X3 column would also > >> >> > be > >> >> two > >> >> > columns regardless of what two-factor interactions we also > specified, > >> >> > but > >> >> > in this case it switches to three. If other two factor interactions > >> >> > are > >> >> > missing in addition to X1:X2, this issue still occurs. This also > >> >> > happens > >> >> > regardless of the contrast specified in contrasts.arg for X3. I > don't > >> >> > see > >> >> > any reasoning for this behavior given in the documentation, so I > >> >> > suspect > >> >> it > >> >> > is a bug. > >> >> > > >> >> > Best regards, > >> >> > Tyler Morgan-Wall > >> >> > > >> >> > [[alternative HTML version deleted]] > >> >> > > >> >> > ______________________________________________ > >> >> > R-devel@r-project.org mailing list > >> >> > https://stat.ethz.ch/mailman/listinfo/r-devel > >> >> > >> > > >> > [[alternative HTML version deleted]] > >> > > >> > ______________________________________________ > >> > R-devel@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel