Implement kmeans++ for initial cluster selection in kmeans
----------------------------------------------------------
Key: MAHOUT-153
URL: https://issues.apache.org/jira/browse/MAHOUT-153
Project: Mahout
Issue Type: New Feature
Components: Clustering
Affects Versions: 0.2
Environment: OS Independent
Reporter: Panagiotis Papadimitriou
The current implementation of k-means includes the following algorithms for
initial cluster selection (seed selection): 1) random selection of k points, 2)
use of canopy clusters.
I plan to implement k-means++. The details of the algorithm are available here:
http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
Design Outline: I will create an abstract class SeedGenerator and a subclass
KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will become
a subclass of SeedGenerator.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.