Re: Is mahout kmeans slow ?

Jeff Eastman Thu, 13 Sep 2012 09:06:09 -0700

On 9/12/12 10:42 PM, Ted Dunning wrote:
-user@
+dev@

Well, this is not really an apples-to-apples comparison is it? Runningany Hadoop job through 200 iterations is unlikely to ever take less than200 minutes because of Hadoop's setup-teardown overhead. And, while asequential, in-memory clustering algorithm may produce excellent resultson a single machine even over large data sets, it isn't k-means.K-means requires every point to be tested against every cluster duringevery iteration. So saying that Mahout k-means is "slow" as a generalstatement kinda bothers me because it implies a comparison to adifferent, Hadoop implementation that AFAICT has not been done.

But maybe I'm just being too sensitive about all the work that has goneinto making Mahout k-means as good as it is...

Yes.

I have been working (slowly) on moving some very fast single pass
clustering into Mahout.  My work in progress currently does very fast
clustering of small dense vectors and it should scale to sparse vectors
fairly well with some small changes.

See https://github.com/tdunning/knn for more info.

On Wed, Sep 12, 2012 at 7:26 PM, Elaine Gan <[email protected]> wrote:

Any ways to improve on the mahout kmeans to speed it up?

Re: Is mahout kmeans slow ?

Reply via email to