I've started to experiment with LDA and am finding that it creates only a single long-running map task for each iteration, which doesn't scale well. The map is taking 20mins for 10k of my input SparseVectors, and 5 hours for 100k (the vocabulary size also grows when there are more vectors).
Is this expected or am I doing something wrong? Are there any existing performance benchmarks? Many thanks! Mark
