[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Phil Steitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284589#comment-14284589
 ] 

Phil Steitz commented on MATH-1197:
---

+1 on the patch

> Incorrect Kolmogorov–Smirnov Statistic for two samples 
> ---
>
> Key: MATH-1197
> URL: https://issues.apache.org/jira/browse/MATH-1197
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.4.1
> Environment: Ubuntu 14.04
>Reporter: Danaja Thiyunuwan Maldeniya
> Attachments: MATH-1197.patch
>
>
> kolmogorovSmirnovTest(double[],double[]) against the samples given below 
> gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
> kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
> instead of 0.064 (verified with ks.test in R and JDistlib)
>   double[] x = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
> 
> ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
> 
> ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
> 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
> ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
> ,30.584960, 30.584960, 30.751808};
>  double[] y = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
>  
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
>  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
> 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Phil Steitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284576#comment-14284576
 ] 

Phil Steitz commented on MATH-1197:
---

Assuming whatever bugs in the D computation have been fixed, our exactP should 
actually be "exact."  I could not make sense of, or find documentation for, 
what R does for small samples.  Our code computes the exact distribution of the 
associated D statistic.  I suspect that R does some kind of approximation.  As 
you said, R I think also disallows ties.

> Incorrect Kolmogorov–Smirnov Statistic for two samples 
> ---
>
> Key: MATH-1197
> URL: https://issues.apache.org/jira/browse/MATH-1197
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.4.1
> Environment: Ubuntu 14.04
>Reporter: Danaja Thiyunuwan Maldeniya
> Attachments: MATH-1197.patch
>
>
> kolmogorovSmirnovTest(double[],double[]) against the samples given below 
> gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
> kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
> instead of 0.064 (verified with ks.test in R and JDistlib)
>   double[] x = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
> 
> ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
> 
> ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
> 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
> ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
> ,30.584960, 30.584960, 30.751808};
>  double[] y = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
>  
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
>  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
> 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284500#comment-14284500
 ] 

Thomas Neidhart commented on MATH-1197:
---

The exactP method also seems to have a problem when comparing it with the 
results from R.
Take this example:

{code}
double[] x = new double[] { 0, 0, 0, 0, 1 };
double[] y = new double[] { 0, 0, 1, 1, 2, 3 };

final KolmogorovSmirnovTest test = new KolmogorovSmirnovTest();
System.out.println("p=" + test.kolmogorovSmirnovTest(x, y, true));
System.out.println("D=" + test.kolmogorovSmirnovStatistic(x, y));

System.out.println("approximateP=" + 
test.approximateP(test.kolmogorovSmirnovStatistic(x, y), x.length, y.length));
System.out.println("exactP=" + 
test.exactP(test.kolmogorovSmirnovStatistic(x, y), x.length, y.length, false));
{code}

returns:

{noformat}
p=0.35714285714285715
D=0.46673
approximateP=0.5925028311389975
exactP=0.4155844155844156
{noformat}

R computes the following:

{noformat}
data:  x and y
D = 0.4667, p-value = 0.5925
alternative hypothesis: two-sided
{noformat}

> Incorrect Kolmogorov–Smirnov Statistic for two samples 
> ---
>
> Key: MATH-1197
> URL: https://issues.apache.org/jira/browse/MATH-1197
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.4.1
> Environment: Ubuntu 14.04
>Reporter: Danaja Thiyunuwan Maldeniya
> Attachments: MATH-1197.patch
>
>
> kolmogorovSmirnovTest(double[],double[]) against the samples given below 
> gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
> kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
> instead of 0.064 (verified with ks.test in R and JDistlib)
>   double[] x = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
> 
> ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
> 
> ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
> 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
> ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
> ,30.584960, 30.584960, 30.751808};
>  double[] y = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
>  
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
>  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
> 10.178538, 10.178538, 10.1

[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Phil Steitz (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14284003#comment-14284003
 ] 

Phil Steitz commented on MATH-1197:
---

Yes, this is a bug.  Arrays.binarySearch should not have been used here.

> Incorrect Kolmogorov–Smirnov Statistic for two samples 
> ---
>
> Key: MATH-1197
> URL: https://issues.apache.org/jira/browse/MATH-1197
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.4.1
> Environment: Ubuntu 14.04
>Reporter: Danaja Thiyunuwan Maldeniya
>
> kolmogorovSmirnovTest(double[],double[]) against the samples given below 
> gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
> kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
> instead of 0.064 (verified with ks.test in R and JDistlib)
>   double[] x = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
> 
> ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
> 
> ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
> 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
> ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
> ,30.584960, 30.584960, 30.751808};
>  double[] y = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
>  
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
>  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
> 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MATH-1197) Incorrect Kolmogorov–Smirnov Statistic for two samples

2015-01-20 Thread Thomas Neidhart (JIRA)

[ 
https://issues.apache.org/jira/browse/MATH-1197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14283756#comment-14283756
 ] 

Thomas Neidhart commented on MATH-1197:
---

One observation: the samples contain a lot of equal values.

The KS test statistic is implemented using Arrays.binarySearch, but this does 
not specify which index will be found when looking for a given value in a 
sorted array.
E.g. if you have samples [0, 0, 0, 0, 0, 1] and you search for 0, you might get 
an index in the range [0, 4]. As far as I understand the KS statistic, it is an 
empirical distribution function which calculates the cumulative density based 
on how many values are less or equal than the given observation, which is not 
equal to the result returned by Arrays.binarySearch.

> Incorrect Kolmogorov–Smirnov Statistic for two samples 
> ---
>
> Key: MATH-1197
> URL: https://issues.apache.org/jira/browse/MATH-1197
> Project: Commons Math
>  Issue Type: Bug
>Affects Versions: 3.4.1
> Environment: Ubuntu 14.04
>Reporter: Danaja Thiyunuwan Maldeniya
>
> kolmogorovSmirnovTest(double[],double[]) against the samples given below 
> gives 5.699107852308316E-12 instead of 0.9793 (approx.) Traced the issue to 
> kolmogorovSmirnovStatistic(double[],double[]) which gives 0.49507389162561577 
> instead of 0.064 (verified with ks.test in R and JDistlib)
>   double[] x = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
> 
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653
> 
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.181199,3.181199,3.181199,3.181199,3.181199,3.181199,3.723539
> 
> ,3.723539,3.723539,3.723539,4.383482,4.383482,4.383482,4.383482,5.320671,5.320671,5.320671,5.717284,6.964001,7.352165
> 
> ,8.710510,8.710510,8.710510,8.710510,8.710510,8.710510,9.539004,9.539004, 
> 10.720619, 17.726077, 17.726077, 17.726077, 17.726077
> ,22.053875 ,23.799144 ,27.355308 ,30.584960 ,30.584960 
> ,30.584960, 30.584960, 30.751808};
>  double[] y = 
> {0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00
>  
> ,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,2.202653
>  
> ,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,2.202653,3.061758,3.723539,5.628420,5.628420,5.628420,5.628420
>  ,5.628420,6.916982,6.916982,6.916982, 10.178538, 10.178538, 
> 10.178538, 10.178538, 10.178538 };



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)