[ https://issues.apache.org/jira/browse/MATH-1246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14790915#comment-14790915 ]
Phil Steitz commented on MATH-1246: ----------------------------------- I could be wrong on this and I am OK with reverting the current exactP ties handling code and replacing with the random jitter approach. I still think the exact p can in fact be computed with ties present; but to do so you have to view the combined sample as the empirical distribution representing the (combined) population. You make a good point above about that being dubious for small samples. I will continue to research this, but given lack of consensus, I will remove the implementation from the code. So let's see if we can agree on # Add non-naive exactP to handle no ties small sample. Extend it to n * m = 10000 as default behavior (this is the cut that R uses). Beyond this point, use the K-S distribution, so we no longer need MonteCarloP for moderate size samples. # Implement jitter method and use this by default in the small sample case to break ties. Until we have eliminated the need for MonteCarloP as a default, use jitter to break ties for moderate sample sizes and use monteCarloP as is post-jitter. Optionally, implement a ks.boot-like monteCarloP that works with tied data. > Kolmogorov-Smirnov 2-sample test does not correctly handle ties > --------------------------------------------------------------- > > Key: MATH-1246 > URL: https://issues.apache.org/jira/browse/MATH-1246 > Project: Commons Math > Issue Type: Bug > Reporter: Phil Steitz > > For small samples, KolmogorovSmirnovTest(double[], double[]) computes the > distribution of a D-statistic for m-n sets with no ties. No warning or > special handling is delivered in the presence of ties. -- This message was sent by Atlassian JIRA (v6.3.4#6332)