Hi , there was a tricky problem, recently, with the chi-square-density of higher dgf's. I discussed thath in sci.stat.consult and in a german newsgroup, got some answers and also think to have understood the real point. But I would like to have a smoother explanation, as I have to deal with it in my seminars. Maybe someone out has an idea or a better shortcut, how to describe it. To illustrate this I just copy&paste an exchange from s.s.consult; hope you forgive my lazyness. On the other hand: maybe the true point comes out better this way.
Regards Gottfried 3 postings added: ---(1/3)------------------------------- [Gottfried] > Hi - > > im stumbling in the dark... eventually only missing any > simple hint. > I'm trying to explain the concept of significance of the > deviation of an empirical sample from a given, expected > distribution. > If we discuss the chi-square-distribution > | > |* > | * > | * > | * > | * > | * > | * > | * > +--------------------------------- > > then this graph illustrates us very well, that and how a > small deviation is more likely to happen than a high deviation - > thus backing the concept of the 95%tiles etc. in the beginners > literature. > Just cutting it in equal slices this curve gives us expected > frequencies of occurences of samples with individual chi-squared > deviations from the expected occurences. > > If I have more df's, then the curve changes its shape; in this > case a 5 df-curve for samples of thrown dices, where I count > the frequencies of occurences of each number and the deviation > of these frequencies from the uniformity. > > | > | > | > | > | * > | * * > | * * > | * * > | * * > +------------------------------------------------- > 0 X²(df=5) > > Now the slices with the highest frequency of occurences > are not the ones with the smallest deviation from the > expected distribution (X²=0) - and even if I accept, that this > is at least so for the cumulative distribution, it is > suddenly no more "self-explaining". It is congruent with > the reality, but our common language is different: > the most likely chisquare-deviation from the uniformity > is now an area which is not at the zero-mark. > So, now: do we EXPECT a deviation from uniformity? > That the count of frequencies of the occurences of the > 6 dices numbers is NOT most likely uniform? HÄH? > Is this suddenly the Nullhypothesis? And do we calculate > the deviation of our empirical sample then from this new > Nullhypothesis??? > > I never thought about that in this way, but since I do > now, I feel a bit confused, maybe I only have to step > aside a bit? > Any good hint appreciated - > > Gottfried. > ---------------------------------------------------------------------- ---(2/3)--------------------------------------- Then one participant answered: > Actually, that corresponds to the notion that if a "random" sequence is > *too* uniform, it isn't really random. For example, if you were to toss a > coin 1000 times, you'd be a little surprised if you got *exactly* 500 > heads and 500 tails. If you think in terms of taking samples from a > multinomial population, the non-monotonicity of the chi-square density > means that a *small* amount of sampling error is more probable than *no* > sampling error, as well as more probable than a *large* sampling error, > which I think corresponds pretty well to our intuition. > ------------------------------------------------------------------- --(3/3)----------------------------------------- I was not really satisfied with this and answered, after I had got some more insight: [Gottfried] > [xxxx] wrote: > > Actually, that corresponds to the notion that if a "random" sequence is > > *too* uniform, it isn't really random. For example, if you were to toss a > > coin 1000 times, you'd be a little surprised if you got *exactly* 500 > > heads and 500 tails. If you think in terms of taking samples from a > > > Yes, this is true. But it is the same with each other combination. > No one is more likely to occur (or better: one should say: variation?). > But then, a student would ask, how could you still attribute a near-expected- > variation more likely than a far-away-expected variation in generality? > > The reason is, that we don't argue about a specific variation, > but about properties of a variation, or in this case, of a combination. > We commonly select the property of "having a distance from the > expected variation", measured in terms of squared deviation. > The mess is, that with this criterion, with multinomial configuration, > there are plenty of variations satisfying the same combinatorial > distance in terms of the squared deviation - up to a local maximum. > My difficulties are, to make this clear in simple words; best in > such simple words, as I used, when I explained the rationale of > chi-square and significance... > Ok, maybe, it's more a subject for news://sci.stat,edu , I guess. > > Thanks again for your input - > > Gottfried Helms. -- 0 0 0 0 0 0 0 0 0 0 0 (need more text for being able to send. Thx Netscape! 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =================================================================