[ 
https://issues.apache.org/jira/browse/MAHOUT-153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pallavi Palleti updated MAHOUT-153:
-----------------------------------

    Attachment: Mahout-153.patch

Kindly find the updated patch which includes test cases. Also,input and output 
formats are modified to be compatible with other clustering algorithms (kmeans, 
 fuzzy kmeans). The distance measure is given as input parameter. And the float 
point comparison as suggested by Shashi is taken care. Kindly review

> Implement kmeans++ for initial cluster selection in kmeans
> ----------------------------------------------------------
>
>                 Key: MAHOUT-153
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-153
>             Project: Mahout
>          Issue Type: New Feature
>          Components: Clustering
>    Affects Versions: 0.2
>         Environment: OS Independent
>            Reporter: Panagiotis Papadimitriou
>            Assignee: Ted Dunning
>             Fix For: 0.4
>
>         Attachments: Mahout-153.patch, Mahout-153.patch, 
> MAHOUT-153_RandomFarthest.patch
>
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The current implementation of k-means includes the following algorithms for 
> initial cluster selection (seed selection): 1) random selection of k points, 
> 2) use of canopy clusters.
> I plan to implement k-means++. The details of the algorithm are available 
> here: http://www.stanford.edu/~darthur/kMeansPlusPlus.pdf.
> Design Outline: I will create an abstract class SeedGenerator and a subclass 
> KMeansPlusPlusSeedGenerator. The existing class RandomSeedGenerator will 
> become a subclass of SeedGenerator.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to