Pallavi,

This is very useful feedback.

What you have done is very similar to the k-means++ algorithm and it is
clearly a very good thing.

There is already an issue for tracking a k-means++ implementation:
http://issues.apache.org/jira/browse/MAHOUT-153

Could you post your patch there?

On Mon, Jan 4, 2010 at 4:03 AM, Palleti, Pallavi <
[email protected]> wrote:

> Initially, I used canopy clustering seeds as initial seeds but the results
> weren't good and the number of clusters depends on the distance thresholds
> we give as input. Later, I have considered randomly selecting some points
> from the input dataset and consider them as initial seeds. Again, the
> results were not good. Now, I have chosen initial seeds from input set in
> such a way that the points are far from each other and I have observed
> better clustering using Fuzzy Kmeans. I have not implemented a map-reducable
> version for this seed selection. I will soon implement a map-reducable
> version and submit a patch.
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to