LDA only executes a single map task per iteration when running in actual distributed mode?

Chad Hinton Tue, 12 Jan 2010 07:24:12 -0800

I saw two comments related to an actual distributed run of the LDA example
but no answer to this question. A previous message in the list confirms that
at least one other person has experienced this issue. I am submitting a map
reduce job to a 20 node Hadoop cluster as follows:


hadoop jar /root/mahout-core-0.2.job
org.apache.mahout.clustering.lda.LDADriver -i
hdfs://master/lda/input/vectors -o hdfs://master/lda/output -k 20 -v 10000
--maxIter 40

where lda/input/vectors is the vectors file generated from the stand alone
build-reuters.sh example. I can only get a single map task to execute while
approx. 57 task slots are available. Has anyone actually ran distributed LDA
successfully? This will help me figure out if I have a hadoop config issue
or if there is an actual algorithm implementation problem. The Hadoop
examples run successfully in distributed mode utilizing all available map
tasks. I'm not sure if there is an issue with the InputSplit for the
SequenceFile or something else... Any help is appreciated.

Chad

LDA only executes a single map task per iteration when running in actual distributed mode?

Reply via email to