For openers, I quote from Pedhazur (2nd edition), p 329 (summary for 
Chapter 9), so that we're all on the same wavelength, more or less:
        "... Regardless of the coding method used, the results of the 
        overall analysis are the same. ..."
   (This is the point that other respondents and I had in mind when we 
   were questioning your interpretation of Pedhazur.) 
   Continuing a few sentences later:
        "... The coding methods do differ in the properties of their
        regression equations.  A brief summary ... follows. ..."
   After the summaries of each method, the final paragraph:
        "Which method of coding one chooses depends on one's purpose and 
        interest.  [For one purpose], dummy coding is the preferred 
        method.  Orthogonal coding is most efficient [for another 
        purpose].  It was shown, however, that the different types of 
        multiple comparisons ... can be easily performed [with] effect 
        coding.  Consequently, effect coding is generally the preferred 
        method of coding categorical variables."

Burke Johnson had written:

<< 1.  I agree with Joe that the term "dummy" in dummy coding is a rather 
dumb term to use for indicator variables.  The term is widely used in
political science, sociology, and business/econometrics (e.g., see
Mendenhall and McClave's A second course in business statistics:
regression analysis).  I'll start using the term indicator coding if 
that's okay. >>
        Yes, that's what I usually use (and Joe will be pleased!).

<< 2.  Okay, I'll check out some interactions (probably two way based on
substantive concerns).  We will have about 15 predictor variables; hence,
I don't think we will include all possible interaction terms!  ... >>

Right.  And how many you can afford will depend heavily on how many 
categories there are in each of your categorical variables.  There might 
be some utility to seeing whether some of those variables (when they have 
more than two categories), if their categories are ordered, can be well 
enough approximated by a linear component [(-1,0,1), (-3,-1,1,3), 
(-2,-1,0,1,2), ...].  Interactions would then cost fewer degrees of 
freedom.  Still, where you can I'd recommend considering some 3-way and 
perhaps even some 4-way interactions (and, as you say, ordinarily based 
on substantive ideas).  
        (Some years ago David Andrews, Paul Corey, and I prepared an ASA
workshop on data analysis for the Toronto chapter, using the byssinosis
data in Andrews & Herzberg (or Herzberg & Andrews, I forget which).  Dave
and Paul used logistic and loglinear analysis, but modelled only 2-way
interactions;  I prepared visual displays for "eyeballing" (modelling the
data as an incidence per mille in each 5-dimensional cell -- after
converting the numbers of cases to a rate, there remained 5 categorical
variables, two of 3 levels each and three of 2 levels), and in the
displays there emerged some interesting-looking 3- and 4-way interactions. 
Unfortunately, none of us had had time to consult with each other before 
the workshop, so none of the provocative interactions got formally 
analyzed;  to this day, I don't know how many of them would have been 
found formally significant.)

Inicidentally, I'd strongly recommend constructing interaction variables 
that are orthogonal at least to their own main effects (and lower-order 
interactions, when there are any), and possibly orthogonal to some or all 
of the apparently irrelevant other predictors.  Else correlations between 
the interaction variables and other variables can, sometimes, be horribly 
confusing;  especially with the "quantitative" (non-categorical) 
variables, whose products with other such variables are likely to be 
strongly (positively) correlated with the original variables merely 
because the original variables tend to be always positive and sometimes 
far from zero -- thus inducing what I've elsewhere called "spurious 
multicollinearity".

<< BTW, the reason I used the term prediction rather than explanation is 
because my objective was primarily predictive (we are trying to predict 
prices of corporate training events)...this use is consistent with 
Pedhazur's [usage]. ... >>

I don't have any problem with that.

<< 3.  In effects coding as I was using the term, consistent with 
Pedhazur, one group always gets -1 (like the group always getting 0 in 
dummy/indicator coding). >>

Yes;  that's what I would have expected to mean, but took your 
"(1,0,-1)", unaccompanied by other complementary codings like "(0,1,-1)", 
along with the assertion of "different results", to mean that maybe you 
were not representing all the d.f. adequately.

<< 4.  I will post an example where dummy/indicator and effects coding 
provide different results when I get back from my trip.  Pedhazur 
specifically recommends not using dummy coding for this reason in the 2nd 
and 3rd editions of his text on multiple regression. ... >>

Well, no, I don't think he does.  He says quite specifically that 'dummy'
coding is the preferred method in some circumstances (perhaps those don't
apply to your situation;  you haven't said).  And he quite clearly writes
(see the quote above) that "effect coding is generally [NOT always, note!]
the preferred method" because "different types of multiple comparisons ... 
can be easily performed [using] effect coding".  NOT because, as you 
write, "dummy/indicator and effects coding provide different results".

        I'll remark in passing what seems to me obvious, but seems often 
to be overlooked by others.  One need not choose the same coding method 
for all one's categorical variables;  one could use indicator variables 
here and effect coding there, and orthogonal coding somewhere else.  
Further, because interactions are conceptually independent of their own 
main effects, one could quite reasonably use, say, effect coding for the 
"main effect" of a categorical variable, while using "dummy coding" in 
constructing that variable's interaction(s) with (an)other variable(s). 
For me, the question at issue is always
  First, is there a significant effect (of an interaction, say)?
        [This is why I prefer to orthogonalize the interaction variables 
         in the first instance:  their effect is then not confounded with 
         contemporaneous main effects (&c.).]
        Addressing this question (and others of its ilk) permits one to 
        arrive fairly efficiently at a suitable reduced model.
  Second, what form of coding, for each variable remaining in the reduced 
        model, permits the clearest style of interpretation?
        [This might lead to different codings for different variables, 
         depending in part on substantive issues, in part on the shape of 
         the statistical results, in part on the prejudices of the 
         investigator.  (This is, at bottom, a question of aesthetics;
         and as we all know, beauty lies in the eye of the beholder.)]

                                                -- Don.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  

Reply via email to