[ https://issues.apache.org/jira/browse/SPARK-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15237657#comment-15237657 ]
Xiangrui Meng commented on SPARK-14154: --------------------------------------- [~yuhaoyan] The main purpose of the initial implementation of K-S test was to avoid that `zipWithIndex` in your implementation, which triggers one more Spark job. Did you compare the performance? Please run a benchmark with some large dataset and see whether it is worth to keep the initial implementation. Thanks! > Simplify the implementation for Kolmogorov–Smirnov test > ------------------------------------------------------- > > Key: SPARK-14154 > URL: https://issues.apache.org/jira/browse/SPARK-14154 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: yuhao yang > Assignee: yuhao yang > Priority: Minor > Fix For: 2.0.0 > > > I just read the code for KolmogorovSmirnovTest and find it could be much > simplified following the original definition. > Send a PR for discussion -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org