Re: rescaling, centering ... and why is everything so hard with my

Donald Burrill Tue, 18 Feb 2003 13:01:48 -0800

My paper "On modelling and interpreting interactions in multiple
regression" is available as a White Paper on the Minitab web site
(Sorry, I don't recall the detailed URL;  go to minitab.com and look
around a bit.)  The example there displayed has four predictors, three
of them binary and one quasi-continuous, and interactions up to the
4-way interaction.  As you will see, once the model was reduced to those
effects that were, let's say, detectable, it was convenient to express
the binary variables with (1,0) coding (rather than, say, (1, -1)
coding or (1,2) coding) and in that form the final model contained only
two or three two-way interactions but did *not* contain one of the main
effects;  notwithstanding which the model was perfectly well
interpretable, since each predictor (except the continuous one)
represented "people like this, or not".  (It was unnecessary to have
male/female as a predictor, because the entire effect of different sexes
could be represented as "females who ran".)


A basic idea there illustrated is that when there are high order
interactions to be modelled, in one's preliminary inquiries it is often
useful to model the interactions as "pure interaction" terms.  The
frequent practice of simply multiplying together the original variables
(to obtain products that carry the interaction information) in general
results in interaction variables that are correlated, often VERY
strongly, with lower order interactions and "main effects".  If you're
only modelling as far as second-order interactions (quadratic effects),
centering of a sort can reduce and sometimes eliminate these
correlations, but no centering or rescaling can possibly effect this for
models including 3-way and higher interactions.

To obtain "pure interaction" terms, orthogonalize each interaction
variable with respect to its main effects and all lower-order
interactions.  (This is a version of Gram-Schmidt orthogonalizing, which
is discussed in Draper & Smith, "Applied Regression Analysis", inter
alia.  (Wiley 1966 originally, but I have somewhere a 2nd edition of
about 1980 and I think there may be one or more later editions.)

If you analyze a "full model" containing all interactions of interest,
where the interactions have been orthogonalized, you can be sure at
least that the significance (or lack thereof!) of any interaction is NOT
an artifact of its more-or-less-accidental correlation with lower order
terms.  Having identified what formal interactions exist, one can then
wrestle with questions like how most effectively to represent the
original variables (if that should be a matter of choice) and their
interactions (which needn't conform to the way the raw variables are
expressed, but ought at least to be in a form that makes some sense to
the reader).

I hope this will have been helpful.

On Mon, 17 Feb 2003, Mountain Bikn' Guy wrote (in part), seeking

> information & opinions on rescaling / centering predictor variables.
>
> Part 1 of my question:
> ... Can anyone recommend a discussion on standardizing, normalizing,
> rescaling, centering, etc. in data mining type problems?  (including
> regression.)
>
> Part 2:
> I expect my regression models to have high order interaction terms
> that are significant when the main effects are not signficant. If
> this were *not* the case, I would center my predictor variables at
> zero. With this "unorthodox" model form, my intuition tells me I
> will get better results if I use a range that does not include zero.
> With high order interactions and zero-centered predictors, if one
> variable value is equal to the mean, the whole interaction term
> would be zero. This doesn't model the problem correctly. Any
> thoughts?
>
> FYI, the regression models with high order interaction terms that
> are significant when the main effects are not significant are
> expected to be an intermediate model, not a final deployed model.
> The final model is expected to be more traditional.
>
> David

 -----------------------------------------------------------------------
 Donald F. Burrill                                            [EMAIL PROTECTED]
 56 Sebbins Pond Drive, Bedford, NH 03110                 (603) 626-0816

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: rescaling, centering ... and why is everything so hard with my

Reply via email to