I don't have an answer to Alan's question, but a description of another
technique in trying to perceive what a contingency table might be trying
to tell one. Like Alan, I have not seen this mentioned in textbooks;
OTOH, I wouldn't expect to, because I think it analogous to post-hoc
tests in ANOVA, which are hardly ever dealt with earlier than a second
course, and contingency-table chi-square is hardly ever dealt with
except in a first course.
If the overall chi-square is significant, it is interesting to ask where,
particularly, the relationship is localized (or indeed whether it can be
localized) as a cell whose frequency is notably high (or low) compared to
the frequency expected under the null hypothesis of independence between
the two classification schemes. For each cell, the contribution to the
overall chi-square is the square of Alan's standardised residual SR (see
below), and can be conceptualized as a chi-square variate (if that's an
appropriate locution) with 1 degree of freedom.
(One could argue for a fractional number of degrees of freedom equal to
(r-1)(c-1)/(rc), but chi-square tables with fractional df are hard to
find, and using 1 df is conservative.)
(On this point, I observe that the SE for Alan's SR gets closer to 1 as
the number of rows and columns in the table increase, so it appears to be
a way to deal with this fractional df; since if the cell contribution
(the squared SR) really were distributed as chi-square with 1 df, its
square root would be N(0,1).)
Now suppose you find one or more cells with impressively large
contributions (> 3.84, say) to the total chi-square. The fact that cell
(i,j) has a perceptibly greater than expected frequency implies that the
rest of the cells in row i, on the average, have smaller frequencies than
expected (since the expected frequencies have the same row sum as the
observed frequencies); and similarly for the rest of the cells in column
j. This in turn implies that one's ability to detect interesting effects
in those other cells (in row i and column j) is somewhat distorted by the
presence of cell (i,j): that sensitivity is enhanced or suppressed,
depending on the direction in which any contemplated cell departs from
its expected frequency.
To reduce the effect of cell (i,j) to zero, so that one may more clearly
see what other effects may have been affected by its presence, substitute
an artificial frequency in cell (i,j), so chosen that the contribution of
cell (i,j) to the total chi-square is now zero. (In general,
substituting the original expected value is a start, but it turns out not
to go far enough -- one has to over-correct, as it were. I've found 3 or
4 iterations to suffice most of the time.) The new total chi-square has
one fewer d.f., of course (or perhaps several fewer d.f., if several
cells' frequencies were being adjusted at the same time); and if
significant, one again seeks cells whose contribution to the new total
is impressively large. One may continue until the overall (adjusted)
chi-square value is no longer large enough to be interesting, or one may
stop sooner if (notwithstanding an interestingly large total chi-square)
there are no cells whose contribution is large enough to compel one's
attention.
I sometimes think of this process as getting the foreground phenomena out
of the way so one can see what's in the middle distance. Or, looking for
the fine structure after adjusting for gross effects.
Like Alan, I'd be interested in pointers to references or derivations for
this procedure...
-- Don.
On Thu, 6 Jul 2000, Alan McLean wrote:
> For some years I have been teaching a technique which I know as testing
> the components of chi square in a standard contingency table problem.
> If you calculate the standardised residual
>
> SR = (fo - fe)/sqrt(fe)
>
> for each cell, these residuals are approximately normally distributed
> with mean zero and standard error given by
>
> SE = sqrt((1 - rowsum/overallsum)*(1-columnsum/overallsum))
>
> provided the expected frequencies are large enough (as for the use of
> chi square itself).
>
> My problem is that I have no source for this technique. I have never
> seen it in a textbook. (I have no doubt about its validity, and frankly
> don't understand why textbooks do not refer to it.)
>
> Can anyone give me a reference to it? Ideally, a reference to its
> original publication.
>
> Alan McLean ([EMAIL PROTECTED])
> Department of Econometrics and Business Statistics
> Monash University, Caulfield Campus, Melbourne
> Tel: +61 03 9903 2102 Fax: +61 03 9903 2007
------------------------------------------------------------------------
Donald F. Burrill [EMAIL PROTECTED]
348 Hyde Hall, Plymouth State College, [EMAIL PROTECTED]
MSC #29, Plymouth, NH 03264 603-535-2597
184 Nashua Road, Bedford, NH 03110 603-471-7128
===========================================================================
This list is open to everyone. Occasionally, less thoughtful
people send inappropriate messages. Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.
For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================