Hi all,
I have a little worry with the way Splus calculates the two-sample
Kolmogorov-Smirnov statistics.
Consider the following samples:
---
> ds1
[1] 1 2 3 4 5 6 7 8 9 10
> ds2
[1] 10 10 10 10 10 10 10 10 10 10
---
The output of the KS test is:
---
> ks.gof(ds1,ds2)
Two-Sample Kolmogorov-Smirnov Test
data: ds1 and ds2
ks = 1, p-value = 0
alternative hypothesis: cdf of ds1 does not equal the
cdf of ds2 for at least one sample point.
---
However, the difference between the empirical cdf for sample 1 and 2 is:
Interval: ]-oo;1[ [1;2[ [2;3[ ... [9;10[ [10;+oo[
cdf1-cdf2: 0 0.1 0.2 ... 0.9 0
So the KS statistics (max absolute difference between the cdf's)
should be 0.9, not 1 as indicated by Splus (ks = 1 above).
Incidentally this yields an underestimated p-value.
I would very much appreciate comments on this behaviour, in particular,
is this a bug in Splus, and is it well-known. It seems like there are
implications for people relying on this test.
Pointers to appropriate forums/newsgroups also appreciated.
Thanks in advance and best regards from Grenoble,
Cyril.
---
Cyril Goutte [EMAIL PROTECTED]
INRIA Rhone-Alpes Tel: (+33) 4 76 61 55 13
Zirst - 655 avenue de l'Europe - Montbonnot Fax: (+33) 4 76 61 54 77
38334 Saint Ismier Cedex - France www.inrialpes.fr/is2
=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
=================================================================