My paper "On modelling and interpreting interactions in multiple regression" is available as a White Paper on the Minitab web site (Sorry, I don't recall the detailed URL; go to minitab.com and look around a bit.) The example there displayed has four predictors, three of them binary and one quasi-continuous, and interactions up to the 4-way interaction. As you will see, once the model was reduced to those effects that were, let's say, detectable, it was convenient to express the binary variables with (1,0) coding (rather than, say, (1, -1) coding or (1,2) coding) and in that form the final model contained only two or three two-way interactions but did *not* contain one of the main effects; notwithstanding which the model was perfectly well interpretable, since each predictor (except the continuous one) represented "people like this, or not". (It was unnecessary to have male/female as a predictor, because the entire effect of different sexes could be represented as "females who ran".)
A basic idea there illustrated is that when there are high order interactions to be modelled, in one's preliminary inquiries it is often useful to model the interactions as "pure interaction" terms. The frequent practice of simply multiplying together the original variables (to obtain products that carry the interaction information) in general results in interaction variables that are correlated, often VERY strongly, with lower order interactions and "main effects". If you're only modelling as far as second-order interactions (quadratic effects), centering of a sort can reduce and sometimes eliminate these correlations, but no centering or rescaling can possibly effect this for models including 3-way and higher interactions. To obtain "pure interaction" terms, orthogonalize each interaction variable with respect to its main effects and all lower-order interactions. (This is a version of Gram-Schmidt orthogonalizing, which is discussed in Draper & Smith, "Applied Regression Analysis", inter alia. (Wiley 1966 originally, but I have somewhere a 2nd edition of about 1980 and I think there may be one or more later editions.) If you analyze a "full model" containing all interactions of interest, where the interactions have been orthogonalized, you can be sure at least that the significance (or lack thereof!) of any interaction is NOT an artifact of its more-or-less-accidental correlation with lower order terms. Having identified what formal interactions exist, one can then wrestle with questions like how most effectively to represent the original variables (if that should be a matter of choice) and their interactions (which needn't conform to the way the raw variables are expressed, but ought at least to be in a form that makes some sense to the reader). I hope this will have been helpful. On Mon, 17 Feb 2003, Mountain Bikn' Guy wrote (in part), seeking > information & opinions on rescaling / centering predictor variables. > > Part 1 of my question: > ... Can anyone recommend a discussion on standardizing, normalizing, > rescaling, centering, etc. in data mining type problems? (including > regression.) > > Part 2: > I expect my regression models to have high order interaction terms > that are significant when the main effects are not signficant. If > this were *not* the case, I would center my predictor variables at > zero. With this "unorthodox" model form, my intuition tells me I > will get better results if I use a range that does not include zero. > With high order interactions and zero-centered predictors, if one > variable value is equal to the mean, the whole interaction term > would be zero. This doesn't model the problem correctly. Any > thoughts? > > FYI, the regression models with high order interaction terms that > are significant when the main effects are not significant are > expected to be an intermediate model, not a final deployed model. > The final model is expected to be more traditional. > > David ----------------------------------------------------------------------- Donald F. Burrill [EMAIL PROTECTED] 56 Sebbins Pond Drive, Bedford, NH 03110 (603) 626-0816 . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
