On Jul 1, 2009, at 2:07 PM, Adil Aijaz wrote:

I was looking at the RandomSeedGenerator and, correct me if I am wrong, but it is not really random; rather it does a bunch of bernoulli trials where the points that are in the beginning of your data are always going to have a higher chance of being selected than those near the end.

I was just going off of Ted's suggestion that for k-Means it wasn't really all that important to be truly random for the initial seeds. We discussed PRNGs and a M/R way of doing it, but I didn't think it was necessary for this. Fine if someone else wants to take it up.



Maybe that's not a problem since given sufficient iterations kmeans should converge toward a solution. But, I thought I'd point it out in case there is an issue here.

Understood.


Adil

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to