On Jul 1, 2009, at 2:07 PM, Adil Aijaz wrote:
I was looking at the RandomSeedGenerator and, correct me if I am
wrong, but it is not really random; rather it does a bunch of
bernoulli trials where the points that are in the beginning of your
data are always going to have a higher chance of being selected than
those near the end.
I was just going off of Ted's suggestion that for k-Means it wasn't
really all that important to be truly random for the initial seeds.
We discussed PRNGs and a M/R way of doing it, but I didn't think it
was necessary for this. Fine if someone else wants to take it up.
Maybe that's not a problem since given sufficient iterations kmeans
should converge toward a solution. But, I thought I'd point it out
in case there is an issue here.
Understood.
Adil
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search