On Sep 23, 2009, at 6:05 AM, Levy, Mark wrote:

I've started to experiment with LDA and am finding that it creates only
a single long-running map task for each iteration, which doesn't scale
well. The map is taking 20mins for 10k of my input SparseVectors, and 5
hours for 100k (the vocabulary size also grows when there are more
vectors).

Is this expected or am I doing something wrong? Are there any existing
performance benchmarks?


That's pretty new code, so I doubt there is much for benchmarks. If you can share your vectors (the serialized ones, not the originals with text) than we can profile and look into it a bit more.

Also, you may want to look at MAHOUT-165 in JIRA, as there are some performance improvements for sparse vector using primitives.


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to