tricky explanation problem with chi-square on multinomial

Gottfried Helms Fri, 25 Jan 2002 04:12:09 -0800

Hi , 

 there was a tricky problem, recently, with the chi-square-density
 of higher dgf's. 
 I discussed thath in sci.stat.consult and in a german newsgroup,
 got some answers and also think to have understood the real point.
 
 But I would like to have a smoother explanation, as I have to
 deal with it in my seminars. Maybe someone out has an idea or
 a better shortcut, how to describe it.
 To illustrate this I just copy&paste an exchange from s.s.consult;
 hope you forgive my lazyness. On the other hand: maybe the
 true point comes out better this way.


Regards 
Gottfried


3 postings added:
---(1/3)-------------------------------
[Gottfried]
> Hi -
> 
>    im stumbling in the dark... eventually only missing any
>    simple hint.
>    I'm trying to explain the concept of significance of the
>    deviation of an empirical sample from a given, expected
>    distribution.
>    If we discuss the chi-square-distribution
>      |
>      |*
>      | *
>      | *
>      |  *
>      |   *
>      |     *
>      |          *
>      |                        *
>      +---------------------------------
> 
>    then this graph illustrates us very well, that and how a
>    small deviation is more likely to happen than a high deviation -
>    thus backing the concept of the 95%tiles etc. in the beginners
>    literature.
>    Just cutting it in equal slices this curve gives us expected
>    frequencies of occurences of samples with individual chi-squared
>    deviations from the expected occurences.
> 
>    If I have more df's, then the curve changes its shape; in this
>    case a 5 df-curve for samples of thrown dices, where I count
>    the frequencies of occurences of each number and the deviation
>    of these frequencies from the uniformity.
> 
>      |
>      |
>      |
>      |
>      |            *
>      |          *    *
>      |        *            *
>      |     *                        *
>      |  *                                         *
>      +-------------------------------------------------
>      0            X²(df=5)
> 
>      Now the slices with the highest frequency of occurences
>      are not the ones with the smallest deviation from the
>      expected distribution (X²=0) - and even if I accept, that this
>      is at least so for the cumulative distribution, it is
>      suddenly no more "self-explaining". It is congruent with
>      the reality, but our common language is different:
>      the most likely chisquare-deviation from the uniformity
>      is now an area which is not at the zero-mark.
>      So, now: do we EXPECT a deviation from uniformity?
>      That the count of frequencies of the occurences of the
>      6 dices numbers is NOT most likely uniform? HÄH?
>      Is this suddenly the Nullhypothesis?  And do we calculate
>      the deviation of our empirical sample then from this new
>      Nullhypothesis???
> 
>      I never thought about that in this way, but since I do
>      now, I feel a bit confused, maybe I only have to step
>      aside a bit?
>      Any good hint appreciated -
> 
> Gottfried.
> 
----------------------------------------------------------------------

---(2/3)---------------------------------------
Then one participant answered:

> Actually, that corresponds to the notion that if a "random" sequence is 
> *too* uniform, it isn't really random.  For example, if you were to toss a 
> coin 1000 times, you'd be a little surprised if you got *exactly* 500 
> heads and 500 tails.  If you think in terms of taking samples from a 
> multinomial population, the non-monotonicity of the chi-square density 
> means that a *small* amount of sampling error is more probable than *no* 
> sampling error, as well as more probable than a *large* sampling error, 
> which I think corresponds pretty well to our intuition.
> 

-------------------------------------------------------------------

--(3/3)-----------------------------------------
I was not really satisfied with this and answered, after I had
got some more insight:

[Gottfried]
>   [xxxx] wrote:
> > Actually, that corresponds to the notion that if a "random" sequence is 
> > *too* uniform, it isn't really random.  For example, if you were to toss a 
> > coin 1000 times, you'd be a little surprised if you got *exactly* 500 
> > heads and 500 tails.  If you think in terms of taking samples from a 
> 
> 
> Yes, this is true. But it is the same with each other combination.
> No one is more likely to occur (or better: one should say: variation?).
> But then, a student would ask, how could you still attribute a near-expected-
> variation more likely than a far-away-expected variation in generality?
>
> The reason is, that we don't argue about a specific variation,
> but about properties of a variation, or in this case, of a combination.
> We commonly select the property of "having a distance from the
> expected variation", measured in terms of squared deviation.
> The mess is, that with this criterion, with multinomial configuration,
> there are plenty of variations satisfying the same combinatorial
> distance in terms of the squared deviation - up to a local maximum.
> My difficulties are, to make this clear in simple words; best in
> such simple words, as I used, when I explained the rationale of
> chi-square and significance...
> Ok, maybe, it's more a subject for news://sci.stat,edu , I guess.
> 
> Thanks again for your input -
> 
> Gottfried Helms.

-- 
0
0
0
0
0
0
0
0
0
0
0
(need more text for being able to send. Thx Netscape!
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0


=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
                  http://jse.stat.ncsu.edu/
=================================================================

tricky explanation problem with chi-square on multinomial

Reply via email to