Re: Nonrandomness of binary matrices

Rich Strauss Wed, 25 Jul 2001 11:04:04 -0700
Thanks to Rich Ulrich for the suggestion below -- that was the direction I
was heading, but there seem to be difficulties.  The general problem is
that I have a standard [nxp] data matrix, but (skipping over the scientific
details) some of the values are "special", typically 5-20% of them, and I
want to know whether their distribution within the matrix is structured in
some way.  In particular, they might be concentrated in particular rows or
columns, but beyond that I have no notion of "nonrandom".  I'm hoping that
they're uniformly randomly distributed (or rather, not significantly
different from random) because then I can basically ignore the fact that
they're special, for the scientific problem at hand.

I'd like to have two things: a nicely behaved index of "nonrandomness"
(perhaps a test statistic, rescaled to an interval 0-1?) and a significance
test.  So I recoded the matrix as binary, with the special values coded as
1s.  I presumed that the null marginal distributions would be binomial
rather than Poisson because the frequency of occurrence is so high, but
either way I could test that.  And if I measured the deviations of marginal
totals from expected (as a chi-square statistic, perhaps, or a mean squared
deviation) that would provide both an index and a goodness-of-fit
significance test for the entire matrix.

But the problem is: what if the row totals and column totals are not
independent?  I've done a few 2-way chi-square contingency tests on these
matrices (using randomized null distributions, of course, since the
matrices are binary), and some of the results are statistically
significant.  Doesn't this mean that I can't simply accumulate the row and
column totals for a goodness-of-fit test, since they're not always
independent?  And even if I did the goodness-of-fit tests for rows and
columns independently, how do I combine the p-values to get a single level
of singificance for the entire matrix, if the tests are not independent?

I have the feeling that I'm missing something obvious here but I can't
quite get a handle on it, and this little problem is holding up the
analysis of the results from a much larger study.  I've talked to
statisticians on campus, with little progress, so basically I'm begging for
help.

Rich Strauss

At 10:47 AM 7/25/01 -0400, you wrote:
>On 23 Jul 2001 14:22:58 -0700, [EMAIL PROTECTED] (Rich Strauss)
>wrote:
>
>> Say I have a binary data matrix for which both the rows (observations) and
>> columns (variables) are computely permutable.  (In practice, about 5-20% of
>> the cells will contain 1's, and the remainder will contain 0's.)   Assume
>> that the expected probability of a cell containing a '1' is identical for
>> all cells in the matrix.  I'd like to be able to test this assumption by
>> measuring (and testing the significance of) the degree of 'nonrandomness'
>> of the 1's in the matrix.
>> 
>> If the rows and columns were fixed in sequence, then this would be an easy
>> problem involving spatial statistics, but the permutability seems to really
>> complicate things.  I think that I can test the rows or columns separately
>> by comparing the row or column totals against a corresponding binomial
>> distribution using a goodness-of-fit test, but I can't get a handle on how
>> to do this for the entire matrix.  I'd really appreciate ideas about this.
>> Thanks in advance.
>
>I'm not sure that I grasp what you are after, but - an idea.
>
>If they are completely permutable, then "permute":
>sort them by decreasing counts for row and for column.
>This puts me in mind of certain alternatives to "random."
>
>The set of counts on a margin should be ... Poisson?
>The table can be drawn into quadrants or smaller sections, 
>so that the number of 1s in each can be tabulated, to make
>ordinary contingency tables.
>
>-- 
>Rich Ulrich, [EMAIL PROTECTED]
>http://www.pitt.edu/~wpilib/index.html
> 


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: Nonrandomness of binary matrices

Reply via email to