Github user derrickburns commented on the pull request:

    https://github.com/apache/spark/pull/2419#issuecomment-55971460
  
    To understand and evaluate this pull request, I would suggest that a 
reviewer do the following:
    1) Look at the `PointOps` trait  and its `FastEuclideanOps` implementation 
to understand its purpose.
    2) Look at the `MultiKMeans` class that implements the iterations of 
Lloyd's algorithm.  Confirm that this operates as you would expect. 
    3) Look at the `KMeansRandom` class.  Confirm that it creates a `runs` sets 
of `k` random cluster centers each.
    4) Look at the `KMeansParallel` class. Confirm that it implements the K 
Means || algorithm and creates `runs` sets of at most `k` cluster centers. 
    5) Look at the `KmeansPlusPlus` class. Confirm that it implements the K 
Means ++ algorithm.
    
    If the reviewer is familiar with the K Means, K Means ||,  K Means ++ 
algorithms, then I suspect that the code can be thoroughly reviewed in a couple 
of hours.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to