Implement kmeans++ for initial cluster selection in kmeans
----------------------------------------------------------

                 Key: MAHOUT-153
                 URL: https://issues.apache.org/jira/browse/MAHOUT-153
             Project: Mahout
          Issue Type: New Feature
          Components: Clustering
    Affects Versions: 0.2
         Environment: OS Independent
            Reporter: Panagiotis Papadimitriou


The current implementation of k-means includes the following algorithms for 
initial cluster selection (seed selection): 1) random selection of k points, 2) 
use of canopy clusters.

I plan to implement k-means++. The details of the algorithm are available here: 
http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.

Design Outline: I will create an abstract class SeedGenerator and a subclass 
KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will become 
a subclass of SeedGenerator.





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to