[ https://issues.apache.org/jira/browse/SPARK-8598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952906#comment-14952906 ]
Christophe S commented on SPARK-8598: ------------------------------------- Hi, It would be nice to be able to get the KS distance. It gives more insight then just know that the test failed. Thx! > Implementation of 1-sample, two-sided, Kolmogorov Smirnov Test for RDDs > ----------------------------------------------------------------------- > > Key: SPARK-8598 > URL: https://issues.apache.org/jira/browse/SPARK-8598 > Project: Spark > Issue Type: New Feature > Components: MLlib > Reporter: Jose Cambronero > Assignee: Jose Cambronero > Priority: Minor > Fix For: 1.5.0 > > > We have implemented a 1-sample, two-sided version of the Kolmogorov Smirnov > test, which tests the null hypothesis that the sample comes from a given > continuous distribution. We provide various functions to access the > functionality: namely, a function that takes an RDD[Double] of the data and a > lambda to calculate the CDF, a function that takes an RDD[Double] and an > Iterator[(Double,Double,Double)] => Iterator[Double] which uses mapPartition > to provide an optimized way to perform the calculation when the CDF > calculation requires a non-serializable object (e.g. the apache math commons > real distributions), and finally a function that takes an RDD[Double] and a > String name of the theoretical distribution to be used. The appropriate > result class has been added, as well as tests to the HypothesisTestSuite -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org