----- Original Message -----
From: Burke Johnson <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]>
Sent: Thursday, December 16, 1999 9:13 AM
Subject: Prediction Model Question
| Hi,
|
| A student of mine is getting ready to develop a GLM prediction model that will
|include a mixture of categorical and quantitative predictor variables. We will
|probably not include interaction terms in the model (i.e., it will be a main effects
|only model).
|
| Here's my question: Do you suggest using dummy coding (0,1) or effects coding
|(1,0,-1) for the categorical variables included in the model?
|
| The reason I'm asking is because dummy coding does not always give the same result
|for a factorial design as does ANOVA and effects coding, and, hence, Pedhazur
|recommends using effects coding rather than dummy coding in the factorial case. Do
|you know if the choice of dummy or effects coding matters for a main effects only
|model with multiple categorical and quantitatively scaled predictor variables?
|
| Thanks in advance,
| Burke Johnson
|
--------------------------------------------------
Hi, Burke --
First, I use the words BINARY (or INDICATOR) predictors -- and NOT "DUMMY" predictors.
In the beginning ALL PREDICTOR INFORMATION IS BINARY!
It is unfortunate that the word DUMMY has became popular. Students might get the idea
that
there is something wrong with using DUMMIES!! I think that the BINARIES are really
the most
BRILLIANT!!
Now to your concern --
Your last paragraph
"The reason I'm asking is because dummy coding does not always give the same result
for a factorial design as does ANOVA and effects coding, and, hence, Pedhazur
recommends using effects coding rather than dummy coding in the factorial case. Do you
know if the choice of dummy or effects coding matters for a main effects only model
with multiple categorical and quantitatively scaled predictor variables?"
is a very good example of the situation that arises in the use of "packaged"
algorithms. The user of the "package" may have no idea what questions are being
answered by the
"package".
I always suggest that researchers create their own models! That is the only SAFE WAY!
If a "packaged" procedure is verified to produce the results desired by the researcher
then it certainly
should be used.
The researcher should:
1. State their research questions in "natural language" -- avoid terms such as "MAIN
EFFECTS" and
"EFFECTS CODING" since those expressions may mean different things to different
people. In some instances
the user of those terms may not know what is meant when they utter the statement.
Ask someone what they
mean if they utter something about MAIN EFFECTS in a 3-factor ANOVA with unequal
numbers of observations
in the cells.
2. Create an ASSUMED MODEL that allows the researcher to investigate their research
questions of interest.
3. Impose resrictions on the parameters of ASSUMED MODEL that are implied by the
research questions of interest.
This results in a RESTRICTED MODEL.
4. Compare the Error Sum of Squares between the ASSUMED and RESTRICTED MODELS using an
F-test and
obtain confidence intervals if appropriate.
I assume there must be a reason for assuming that there is NO INTERACTION among the
predictors.
Many researchers would test for NO INTERACTION first. Then, if appropriate, switch to
the NO INTERACTION MODEL.
I would be interested in seeing the models that your student develops to investigate
his/her OWN QUESTIONS OF INTEREST!!
:-)
-- Joe
**************************************************************************
* Joe Ward Health Careers High School
* 167 East Arrowhead Dr 4646 Hamilton Wolfe
* San Antonio, TX 78228-2402 San Antonio, TX 78229
* Phone: 210-433-6575 Phone: 210-617-5400
* Fax: 210-433-2828 Fax: 210-617-5423
* [EMAIL PROTECTED]
* http://www.ijoa.org/joeward/wardindex.html
************************************************************************