[ 
https://issues.apache.org/jira/browse/SPARK-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952906#comment-14952906
 ] 

Christophe S commented on SPARK-8598:
-------------------------------------

Hi,

It would be nice to be able to get the KS distance. It gives more insight then 
just know that the test failed.

Thx!

> Implementation of 1-sample, two-sided, Kolmogorov Smirnov Test for RDDs
> -----------------------------------------------------------------------
>
>                 Key: SPARK-8598
>                 URL: https://issues.apache.org/jira/browse/SPARK-8598
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Jose Cambronero
>            Assignee: Jose Cambronero
>            Priority: Minor
>             Fix For: 1.5.0
>
>
> We have implemented a 1-sample, two-sided version of the Kolmogorov Smirnov 
> test, which tests the null hypothesis that the sample comes from a given 
> continuous distribution. We provide various functions to access the 
> functionality: namely, a function that takes an RDD[Double] of the data and a 
> lambda to calculate the CDF, a function that takes an RDD[Double] and an 
> Iterator[(Double,Double,Double)] => Iterator[Double] which uses mapPartition 
> to provide an optimized way to perform the calculation when the CDF 
> calculation requires a non-serializable object (e.g. the apache math commons 
> real distributions), and finally a function that takes an RDD[Double] and a 
> String name of the theoretical distribution to be used. The appropriate 
> result class has been added, as well as tests to the HypothesisTestSuite



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to