Sure Ted. I will do that.
Thanks
Pallavi
Ted Dunning wrote:
Pallavi,
This is very useful feedback.
What you have done is very similar to the k-means++ algorithm and it is
clearly a very good thing.
There is already an issue for tracking a k-means++ implementation:
http://issues.apache.org/jira/browse/MAHOUT-153
Could you post your patch there?
On Mon, Jan 4, 2010 at 4:03 AM, Palleti, Pallavi <
[email protected]> wrote:
Initially, I used canopy clustering seeds as initial seeds but the results
weren't good and the number of clusters depends on the distance thresholds
we give as input. Later, I have considered randomly selecting some points
from the input dataset and consider them as initial seeds. Again, the
results were not good. Now, I have chosen initial seeds from input set in
such a way that the points are far from each other and I have observed
better clustering using Fuzzy Kmeans. I have not implemented a map-reducable
version for this seed selection. I will soon implement a map-reducable
version and submit a patch.