Re: ANCOVA vs. sequential regression

Donald Burrill Sun, 22 Apr 2001 18:50:14 -0700
The clearest (and withal concise) exposition of ANCOVA that I ever 
encountered is at the beginning of the third chapter of Tatsuoka's book 
on multivariate analysis.  If you can find a copy, it would both explain 
what ANCOVA is all about, and illuminate the more cryptic responses 
you've already had.

As has already been agreed, you are correct that the "second analysis 
option" is identical to the first, the only difference being whether one 
uses a program labelled "ANCOVA" or one labelled "regression" to carry 
out the work.

FWIW, I'd be inclined to use a regression program, for a couple of
reasons:  (1) one can investigate directly whether A and X interact, 
which is to say whether the slope of the regression of Y upon X differs 
in the groups represented by A, and some ANCOVA programs do not permit 
this even to be thought of;  (2) one can control the order in which the 
predictors are considered, which (in a system like MINITAB that reports 
sequential sums of squares) can be informative. 

I would agree with the respondent that urged you to plot Y vs X using 
different symbols for the two levels of A.  In MINITAB's character 
graphics, for instance, 
        LPLOT 'Y' 'X' 'A'

If I were doing it, I'd begin with a "full model" (or "augmented model", 
in Judd & McClelland's terms) containing three predictors:
        y  =  b0 + b1*X + b2*A + b3*(AX) + error
 where A had been recoded to (0,1) and (AX) = A*X.    [1]

If b3 is close enough to 0 to disregard, one would be interested in a 
"reduced" (or "compact") model
        y  =  b0 + b1*X + b2*A + error                [2]
 which fits a common slope to the regression of Y upon X in both groups. 
Otherwise, b3 represents the difference between the slope in group "1" 
and the slope in group "0", and the differences in values of Y between 
groups depend on the value of X.  Whether this is much of a complication 
or not depends on such things as whether the _direction_ of that 
difference differs within the range of X observed ... which pretty well 
requires that one examine the letter-plot described above.

In model [2], one is most likely to be interested in whether b2 is close 
enough to 0 to disregard:  that is, whether the data really require two 
parallel lines in the model, or whether one line suffices, in which case 
one wants to fit the model
        y  =  b0 + b1*X + error.           [3]

(Of course, the models [1] [2] and [3] above are not exhaustive.  But 
discussing others would require speculating even more egregiously than 
I've already done about possible shapes of the data...)

On Fri, 20 Apr 2001, William Levine wrote:

> Here is a statistical issue that I have been pondering for a few weeks 
> now, and I am hoping someone can help set me straight.
> 
> A study was conducted to assess whether there were age differences in 
> memory for order independent of memory for items.  Two preexisting 
> groups (younger and older adults - let's call this variable A) were 
> tested for memory for order information (Y).  These groups were also 
> tested for item memory (X). 

One respondent complained that the two groups appeared not to be randomly 
selected.  I can't tell from this description whether that be true or 
not;  but the first question, I should think, is whether in these data 
there appears to be any effect of Age at all.  If there is, one can then 
worry about whether the effect is properly _attributable_ to Age, or to 
any of the variables with which Age is doubtless confounded -- history, 
for one obvious example -- and try to devise a research design that will 
help these potential sources of variability to be disentangled in future 
research.

Also, if there is an "Age" effect, it may be worthwhile (depending partly 
on how much data one has) fitting a model in which Age is allowed to vary 
on a [quasi-]continuum.  (In model [1] above, use something like raw Age 
rather than the dichotomy A, or possibly Age expressed as a deviation 
from some middling value;  in that case, I'd want to express (AX) as the 
part of the product Age*X that is orthogonal both to Age and to X.)

> Two ways of analyzing these data came to mind. One was to perform an 
> ANCOVA treating X as a covariate. But the two groups differ with 
> respect to X, which would make interpretation of the ANCOVA difficult. 

That might depend on the form of the ANCOVA program output;  another 
reason I prefer using a regression program.  But even in ANCOVA, there 
are only the two predictors, and if Y and X are correlated (as I gather 
they must be, reading between the lines), the only question is whether 
there are different regression lines for each group, or whether one line 
suffices for both:  presuming, of course, that the regression slopes ARE 
parallel, which may not be possible to examine except via a regression 
program.

> Thus, an ANCOVA did not seem like the correct analysis.
> 
> A second analysis option (suggested by a friend) is to perform a 
> sequential regression, entering X first and A second to
> test if there is significant leftover variance explained by A.
> 
> This second option sounds to me like the same thing as the first ...

        Right.

        < snip > 

> ... Finally, does anyone have any suggestions?

Mainly, start out with model [1] (or one like it), so you can tell 
explicitly whether the slopes in the different groups can be treated 
as parallel, or whether that would be a gross and perhaps misleading 
oversimplification.

It may also be interesting to ask whether Y, X, and/or Age might usefully 
be represented by non-linear functions of some kind.
                                                        -- Don.
 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-472-3742  


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: ANCOVA vs. sequential regression

Reply via email to