Github user rnowling commented on the pull request:

    https://github.com/apache/spark/pull/1964#issuecomment-53570185
  
    @mengxr , @yu-iskw 
    
    I think it is valuable to contribute distance metrics to Breeze, but not 
all of the metrics provided by @yu-iskw may be of interest to Breeze.  If MLLib 
provides its own wrapper, we can call Breeze for what distance metrics are 
available there and provide our own implementations for others.
    
    There was interest on the mailing list in different distance metrics for 
KMeans.  I think this PR should be amenable towards a solution for that.  My 
main complaint is that the distance metrics implementedin this PR expect MLlib 
Vectors, not Breeze vectors.  Before this is committed, I think we should 
figure out how to generalize these metrics to Breeze vectors -- maybe add 
distance(breeze, breeze) functions to @yu-iskw  's implementation or make 
breeze vectors the default type and provide an implicit way to cast MLlib 
vectors to Breeze vectors?
    
    Once native support for Breeze vectors is available, we can start work on a 
high-level API to distance metrics for KMeans and provide an implementation 
using the code in this PR.  A string-based API may be one option but this would 
not support distance metrics (e.g., weighted, L-n norms) which require 
additional parameters.
    
    What do you think?
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to