Re: Prediction Model Question

Frank E Harrell Jr Wed, 29 Dec 1999 07:48:49 -0800
Well put Donald.  The only additional points I wish to make are that in my
career I've
never seen balanced factorial data with normal errors.  Only in the case where
the
study was done in a balanced way (i.e., experimental study, no missing data,
etc.) AND
where the model is a regression model with normal errors are effects orthogonal
[even with
perfect balance, nonlinear models such as logistic models do not yield
orthogonal
esetimates.].  Even then, I'm not
often interested in "main effect" tests, which are averages of stratified
estimates (stratified
by the other factor).

Even when I want to average the stratified effects, I do it by getting
differences in
predicted values, therefore coding is of no concern to me.  In S-Plus I have a
contrast
function that uses this method, e.g. contrast(fit.result,
list(age=65,sex=c('male','female')),
  list(age=21, sex=c('male','female')), type='average', weights=table(sex))

This does a Type II contrast where weights are the marginal frequencies of male
and female.
If I want a Type III contrast (seldom sensible) I would use weights='equal'.
The contrast
is for age 65 vs. age 21, no matter how nonlinear the age effect is in the
model.

-Frank

"Donald F. Burrill" wrote:

> In response to a comment of mine:
>
> > Incidentally, I'd strongly recommend constructing interaction variables
> > that are orthogonal at least to their own main effects (and lower-order
> > interactions, when there are any), and possibly orthogonal to some or all
> > of the apparently irrelevant other predictors.  Else correlations between
> > the interaction variables and other variables can, sometimes, be horribly
> > confusing;  especially with the "quantitative" (non-categorical)
> > variables, whose products with other such variables are likely to be
> > strongly (positively) correlated with the original variables merely
> > because the original variables tend to be always positive and sometimes
> > far from zero -- thus inducing what I've elsewhere called "spurious
> > multicollinearity".
>
> Frank E Harrell Jr wrote:
>
> > This I do not understand.  I don't see the point in testing main
> > effects in the presence of interaction effects (unlike the pooled main
> > effect + interaction effect tests which are completely invariant to
> > coding).  So I don't see why coding matters.  -Frank Harrell
>
> Sorry if I have confused two issues.  The remark quoted is not related to
> the coding of variables;  it applies generally.  As to "testing main
> effects in the presence of interactions", in a factorial analysis of
> variance one tests main effects and all possible interactions in the
> presence of each other;  and it is standard advice not to attempt to
> interpret main effects (or for that matter lower-order interactions) in
> the presence of significant interaction(s), at least until one has made
> some sense out of the interaction(s) (or, better, out of the pattern of
> main effects & interactions).
>         But in a balanced factorial ANOVA things are unambiguous in two
> ways:  (1) the apparent significance of individual sources of variation
> does not depend on the order of their entry into the model;  (2) the
> significance of any particular source does not depend on the presence or
> absence of other sources.  Both of these are due to the orthogonality
> inherent in a balanced design.  When the predictors are correlated, as is
> usual in regression and in unbalanced ANOVAs, neither of these is true.
> Constructing interactions to be orthogonal to their main effects and to
> lower-order interactions, as recommended above, means at least that one's
> ability to detect main effects is not bollixed up by including the
> interaction terms in the analysis.  It also means that if any interaction
> term is significant, one can believe that one is indeed looking at an
> interaction effect, and not at an artifact arising from inadvertent
> correlation between the interaction variable and its main effects.
>         I take it that one first looks for the patterns of main effects
> and interactions that must be taken into account in the eventual
> restricted model;  then one attempts to interpret the model.  At this
> point coding matters, because the meaning one can attribute to any
> particular coefficient will depend on the coding of the variable.  It
> follows that one may choose to revise the coding, to facilitate or
> simplify the interpretation.
>
>         There is one other sense in which "coding matters", although this
> may be a bit off-topic from the original thread.  Consider an experiment
> in which the subjects are of two sexes, and the experimental treatments
> are mediated by experimenters, who also are of two sexes.  Whatever else
> is going on, there is a 2x2 subdesign representing (sex of Subject) by
> (sex of Experimenter).  One may code both variables, for example, so that
> 0 = male and 1 = female.  Then if the data show a difference between
> cases where subject and experimenter are of the same sex and cases where
> they are of opposite sexes, that's an interaction effect.  But if one had
> coded (0 = male and 1 = female) for Subjects, and (0 = same sex as
> Subject and 1 = opposite sex from Subject) for Experimenters, then the
> effect just described is a main effect of the Experimenter sex variable.
>
>  ------------------------------------------------------------------------
>  Donald F. Burrill                                 [EMAIL PROTECTED]
>  348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
>  MSC #29, Plymouth, NH 03264                                 603-535-2597
>  184 Nashua Road, Bedford, NH 03110                          603-471-7128

--
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat
Re: Prediction Model Question

Reply via email to