Github user rnowling commented on the pull request: https://github.com/apache/spark/pull/1964#issuecomment-53570185 @mengxr , @yu-iskw I think it is valuable to contribute distance metrics to Breeze, but not all of the metrics provided by @yu-iskw may be of interest to Breeze. If MLLib provides its own wrapper, we can call Breeze for what distance metrics are available there and provide our own implementations for others. There was interest on the mailing list in different distance metrics for KMeans. I think this PR should be amenable towards a solution for that. My main complaint is that the distance metrics implementedin this PR expect MLlib Vectors, not Breeze vectors. Before this is committed, I think we should figure out how to generalize these metrics to Breeze vectors -- maybe add distance(breeze, breeze) functions to @yu-iskw 's implementation or make breeze vectors the default type and provide an implicit way to cast MLlib vectors to Breeze vectors? Once native support for Breeze vectors is available, we can start work on a high-level API to distance metrics for KMeans and provide an implementation using the code in this PR. A string-based API may be one option but this would not support distance metrics (e.g., weighted, L-n norms) which require additional parameters. What do you think?
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org