Re: 'Components of chi square'

Donald Burrill Thu, 06 Jul 2000 00:47:39 -0700
I don't have an answer to Alan's question, but a description of another 
technique in trying to perceive what a contingency table might be trying 
to tell one.  Like Alan, I have not seen this mentioned in textbooks;  
OTOH, I wouldn't expect to, because I think it analogous to post-hoc 
tests in ANOVA, which are hardly ever dealt with earlier than a second 
course, and contingency-table chi-square is hardly ever dealt with 
except in a first course.

If the overall chi-square is significant, it is interesting to ask where, 
particularly, the relationship is localized (or indeed whether it can be 
localized) as a cell whose frequency is notably high (or low) compared to 
the frequency expected under the null hypothesis of independence between 
the two classification schemes.  For each cell, the contribution to the 
overall chi-square is the square of Alan's standardised residual SR (see 
below), and can be conceptualized as a chi-square variate (if that's an 
appropriate locution) with 1 degree of freedom. 
 (One could argue for a fractional number of degrees of freedom equal to 
(r-1)(c-1)/(rc), but chi-square tables with fractional df are hard to 
find, and using 1 df is conservative.)
 (On this point, I observe that the SE for Alan's SR gets closer to 1 as 
the number of rows and columns in the table increase, so it appears to be 
a way to deal with this fractional df;  since if the cell contribution 
(the squared SR) really were distributed as chi-square with 1 df, its 
square root would be N(0,1).)

Now suppose you find one or more cells with impressively large 
contributions (> 3.84, say) to the total chi-square.  The fact that cell 
(i,j) has a perceptibly greater than expected frequency implies that the 
rest of the cells in row i, on the average, have smaller frequencies than 
expected (since the expected frequencies have the same row sum as the 
observed frequencies);  and similarly for the rest of the cells in column 
j.  This in turn implies that one's ability to detect interesting effects 
in those other cells (in row i and column j) is somewhat distorted by the 
presence of cell (i,j):  that sensitivity is enhanced or suppressed, 
depending on the direction in which any contemplated cell departs from 
its expected frequency.

To reduce the effect of cell (i,j) to zero, so that one may more clearly
see what other effects may have been affected by its presence, substitute
an artificial frequency in cell (i,j), so chosen that the contribution of
cell (i,j) to the total chi-square is now zero.  (In general, 
substituting the original expected value is a start, but it turns out not 
to go far enough -- one has to over-correct, as it were.  I've found 3 or 
4 iterations to suffice most of the time.)  The new total chi-square has 
one fewer d.f., of course (or perhaps several fewer d.f., if several 
cells' frequencies were being adjusted at the same time);  and if 
significant, one again seeks cells whose contribution to the new total 
is impressively large.  One may continue until the overall (adjusted) 
chi-square value is no longer large enough to be interesting, or one may 
stop sooner if (notwithstanding an interestingly large total chi-square) 
there are no cells whose contribution is large enough to compel one's 
attention.

I sometimes think of this process as getting the foreground phenomena out 
of the way so one can see what's in the middle distance.  Or, looking for 
the fine structure after adjusting for gross effects.

Like Alan, I'd be interested in pointers to references or derivations for 
this procedure...
                                -- Don.

On Thu, 6 Jul 2000, Alan McLean wrote:

> For some years I have been teaching a technique which I know as testing 
> the components of chi square in a standard contingency table problem. 
> If you calculate the standardised residual
> 
> SR = (fo - fe)/sqrt(fe)
> 
> for each cell, these residuals are approximately normally distributed
> with mean zero and standard error given by
> 
> SE = sqrt((1 - rowsum/overallsum)*(1-columnsum/overallsum))
> 
> provided the expected frequencies are large enough (as for the use of
> chi square itself).
> 
> My problem is that I have no source for this technique. I have never
> seen it in a textbook. (I have no doubt about its validity, and frankly
> don't understand why textbooks do not refer to it.)
> 
> Can anyone give me a reference to it? Ideally, a reference to its
> original publication.
> 
> Alan McLean ([EMAIL PROTECTED])
> Department of Econometrics and Business Statistics
> Monash University, Caulfield Campus, Melbourne
> Tel:  +61 03 9903 2102    Fax: +61 03 9903 2007

 ------------------------------------------------------------------------
 Donald F. Burrill                                 [EMAIL PROTECTED]
 348 Hyde Hall, Plymouth State College,          [EMAIL PROTECTED]
 MSC #29, Plymouth, NH 03264                                 603-535-2597
 184 Nashua Road, Bedford, NH 03110                          603-471-7128  



===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================
Re: 'Components of chi square'

Reply via email to