I'm a newbie to the group but not to statistics. Here is my problem: I have a probability distribtion over a discrete (or countably infinite, in any case "naturally binned") space and I want to test whether a sample agrees with the distribution when some bins have few events. Here is an example. The ranks of random 8 x 8 binary matrices (all entries are 0 or 1) are distributed as follows: rank probab sample (eg N=100) 8 0.2899 28 7 0.5776 54 6 0.1273 15 5 0.00512 2 4 4.4e-05 3 8.5e-08 1 2 1.7e-11 1 3.5e-15 0 5.3e-20 I want to test whether a sample of matrices is random at some given confidence level. Most matrices are nearly full-rank, so there will be very few events in the low-rank bins. Here is the progress of my thought: 1) The traditional trick is to combine enough low-event bins together that the expected number of events is 10 or so, and then do a chi2 test. But this patently throws away information. A rank-1 matrix event is telling me much more than a single rank-3 matrix event, but this technique gives them equal weight. So I want a test that doesn't require me to combine bins. 2) How about a KS test? I tried this, but it came back with garbage (told me that data which I know to be good, and which did fine in a chi2 test, was "too good to be true"). I believe one assumption in the KS test is that the distribution being tested is continuous, ie not discrete. So this doesn't work. 3) So what do I do? I would really like a test statistic that becomes chi2 when bin populations are high, but can also handle low bin populations. The distribution of the statistic need not be universal: I have analytic expressions for the probabilities of the inidividual bins and and perfectly willing to plug these into some hellishly complex formula for the distribution of a test statistic. Does such a test statistic exist? 4) I have played with this problem for a few days now. Basically, I think I am looking for a metric on the space of multinomial distributions which would characterize the distance between two distribtuions as something close to the ratio of their likelyhoods. Can someone help me out here? Is this a solved problem? Can someone give a hint or point me in the right direction? ================================================================= Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =================================================================