[ 
https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790915#comment-14790915
 ] 

Phil Steitz commented on MATH-1246:
-----------------------------------

I could be wrong on this and I am OK with reverting the current exactP ties 
handling code and replacing with the random jitter approach.  I still think the 
exact p can in fact be computed with ties present; but to do so you have to 
view the combined sample as the empirical distribution representing the 
(combined) population.   You make a good point above about that being dubious 
for small samples.   I will continue to research this, but given lack of 
consensus, I will remove the implementation from the code.

So let's see if we can agree on 
# Add non-naive exactP to handle no ties small sample.  Extend it to n * m = 
10000 as default behavior (this is the cut that R uses).  Beyond this point, 
use the K-S distribution, so we no longer need MonteCarloP for moderate size 
samples.
# Implement jitter method and use this by default in the small sample case to 
break ties.  Until we  have eliminated the need for MonteCarloP as a default, 
use jitter to break ties for moderate sample sizes and use monteCarloP as is 
post-jitter.

Optionally, implement a ks.boot-like monteCarloP that works with tied data.




> Kolmogorov-Smirnov 2-sample test does not correctly handle ties
> ---------------------------------------------------------------
>
>                 Key: MATH-1246
>                 URL: https://issues.apache.org/jira/browse/MATH-1246
>             Project: Commons Math
>          Issue Type: Bug
>            Reporter: Phil Steitz
>
> For small samples, KolmogorovSmirnovTest(double[], double[]) computes the 
> distribution of a D-statistic for m-n sets with no ties.  No warning or 
> special handling is delivered in the presence of ties.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to