Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-52810571 Great work! If a class implements the DistanceMetric trait, it should probably be renamed ---DistanceMetric. For example, CosineDistanceMeasure should be CosineDistanceMetric. From a performance standpoint, I'm concerned about the vector conversions. KMeans converts all vectors to BreezeVectors internally, so using these metrics would require BreezeVectors -> Vectors -> BreezeVectors. CosineDistanceMeasure actually converts the same vector (v1) twice instead of saving the conversion for reuse! Since the Breeze library should perform vector operations in a more efficient manner, Breeze should be used in place of map where possible. E.g., WeightedEuclideanDistanceMeasure. Was map used there because Breeze doesn't support element-wise operations? I look forward to seeing this included in mllib!
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org