Github user yu-iskw commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-54953348 @mengxr, @erikerlandson, @rnowling Sorry for having been kept waiting. I redesigned the interface of distance functions. Here is the full file change. https://github.com/apache/spark/pull/1964/files I agree to keep MLlib's linear algebra lightweight. And we would had better to use Breeze in terms of the performance. However, I think the standardized distance function API helps us to expand MLlib efficiently. And also, by contributing another distance functions to Breeze, we can use them in Spark. I am a bit worried about using Breeze directly in Spark. One reason is that we can't absolutely control the release of Breeze. If we want to use a new distance function in Spark, we contribute it to Breeze as I did. Because there is a time difference between being merged and release. We have to wait. That's why we should have a wrapper API for Breeze linear algebra. And the wrapper API helps us to extend another distance function easily. I think we need two the standardized distance function API. One is for Spark user, the other is for MLlib developer. `(Vector, Vector) => Double` is for Spark user. `(BV[Double], BV[Double]) => Double ` is for MLlib developer. And Breeze should be hidden for Spark user. At least, we should have a mapping function from `String` to a distance function because of PySpark API. We have to use the implementation in a wrapper API as much as we can. That is, you know, I contributed some distance functions to Breeze. However,they have not been released on Maven repository. That's why we can't use them in Spark yet. If they will be released, I will modify the implementation, such as the `apply` method of `EuclideanDistanceMetric` class. ## Diff Summary - `(BV[Double], BV[Double]) => Double` is for Spark user, such as `EuclideanDistanceMetric` class - Companion Object of the distance Function is for MLlib developer, such as `EuclideanDistanceMetric` object - Extract WeightedDistanceMetric and its sub class from DistanceMetric - Extract WeightedDistanceMeasure and its sub class from DistanceMeasure - Add a mapping function in `mllib/src/main/scala/org/apache/spark/mllib/linalg/distance/DistanceMeasureFactory.scala` thanks
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org