The clearest (and withal concise) exposition of ANCOVA that I ever
encountered is at the beginning of the third chapter of Tatsuoka's book
on multivariate analysis. If you can find a copy, it would both explain
what ANCOVA is all about, and illuminate the more cryptic responses
you've already had.
As has already been agreed, you are correct that the "second analysis
option" is identical to the first, the only difference being whether one
uses a program labelled "ANCOVA" or one labelled "regression" to carry
out the work.
FWIW, I'd be inclined to use a regression program, for a couple of
reasons: (1) one can investigate directly whether A and X interact,
which is to say whether the slope of the regression of Y upon X differs
in the groups represented by A, and some ANCOVA programs do not permit
this even to be thought of; (2) one can control the order in which the
predictors are considered, which (in a system like MINITAB that reports
sequential sums of squares) can be informative.
I would agree with the respondent that urged you to plot Y vs X using
different symbols for the two levels of A. In MINITAB's character
graphics, for instance,
LPLOT 'Y' 'X' 'A'
If I were doing it, I'd begin with a "full model" (or "augmented model",
in Judd & McClelland's terms) containing three predictors:
y = b0 + b1*X + b2*A + b3*(AX) + error
where A had been recoded to (0,1) and (AX) = A*X. [1]
If b3 is close enough to 0 to disregard, one would be interested in a
"reduced" (or "compact") model
y = b0 + b1*X + b2*A + error [2]
which fits a common slope to the regression of Y upon X in both groups.
Otherwise, b3 represents the difference between the slope in group "1"
and the slope in group "0", and the differences in values of Y between
groups depend on the value of X. Whether this is much of a complication
or not depends on such things as whether the _direction_ of that
difference differs within the range of X observed ... which pretty well
requires that one examine the letter-plot described above.
In model [2], one is most likely to be interested in whether b2 is close
enough to 0 to disregard: that is, whether the data really require two
parallel lines in the model, or whether one line suffices, in which case
one wants to fit the model
y = b0 + b1*X + error. [3]
(Of course, the models [1] [2] and [3] above are not exhaustive. But
discussing others would require speculating even more egregiously than
I've already done about possible shapes of the data...)
On Fri, 20 Apr 2001, William Levine wrote:
> Here is a statistical issue that I have been pondering for a few weeks
> now, and I am hoping someone can help set me straight.
>
> A study was conducted to assess whether there were age differences in
> memory for order independent of memory for items. Two preexisting
> groups (younger and older adults - let's call this variable A) were
> tested for memory for order information (Y). These groups were also
> tested for item memory (X).
One respondent complained that the two groups appeared not to be randomly
selected. I can't tell from this description whether that be true or
not; but the first question, I should think, is whether in these data
there appears to be any effect of Age at all. If there is, one can then
worry about whether the effect is properly _attributable_ to Age, or to
any of the variables with which Age is doubtless confounded -- history,
for one obvious example -- and try to devise a research design that will
help these potential sources of variability to be disentangled in future
research.
Also, if there is an "Age" effect, it may be worthwhile (depending partly
on how much data one has) fitting a model in which Age is allowed to vary
on a [quasi-]continuum. (In model [1] above, use something like raw Age
rather than the dichotomy A, or possibly Age expressed as a deviation
from some middling value; in that case, I'd want to express (AX) as the
part of the product Age*X that is orthogonal both to Age and to X.)
> Two ways of analyzing these data came to mind. One was to perform an
> ANCOVA treating X as a covariate. But the two groups differ with
> respect to X, which would make interpretation of the ANCOVA difficult.
That might depend on the form of the ANCOVA program output; another
reason I prefer using a regression program. But even in ANCOVA, there
are only the two predictors, and if Y and X are correlated (as I gather
they must be, reading between the lines), the only question is whether
there are different regression lines for each group, or whether one line
suffices for both: presuming, of course, that the regression slopes ARE
parallel, which may not be possible to examine except via a regression
program.
> Thus, an ANCOVA did not seem like the correct analysis.
>
> A second analysis option (suggested by a friend) is to perform a
> sequential regression, entering X first and A second to
> test if there is significant leftover variance explained by A.
>
> This second option sounds to me like the same thing as the first ...
Right.
< snip >
> ... Finally, does anyone have any suggestions?
Mainly, start out with model [1] (or one like it), so you can tell
explicitly whether the slopes in the different groups can be treated
as parallel, or whether that would be a gross and perhaps misleading
oversimplification.
It may also be interesting to ask whether Y, X, and/or Age might usefully
be represented by non-linear functions of some kind.
-- Don.
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-472-3742
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================