Github user rnowling commented on the pull request:

    https://github.com/apache/spark/pull/1964#issuecomment-52810571
  
    Great work!
    
    If a class implements the DistanceMetric trait, it should probably be 
renamed ---DistanceMetric.  For example, CosineDistanceMeasure should be 
CosineDistanceMetric.
    
    From a performance standpoint, I'm concerned about the vector conversions.  
KMeans converts all vectors to BreezeVectors internally, so using these metrics 
would require BreezeVectors -> Vectors -> BreezeVectors.  CosineDistanceMeasure 
actually converts the same vector (v1) twice instead of saving the conversion 
for reuse!
    
    Since the Breeze library should perform vector operations in a more 
efficient manner, Breeze should be used in place of map where possible.  E.g., 
WeightedEuclideanDistanceMeasure. Was map used there because Breeze doesn't 
support element-wise operations?
    
    I look forward to seeing this included in mllib!
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to