I saw two comments related to an actual distributed run of the LDA example but no answer to this question. A previous message in the list confirms that at least one other person has experienced this issue. I am submitting a map reduce job to a 20 node Hadoop cluster as follows:
hadoop jar /root/mahout-core-0.2.job org.apache.mahout.clustering.lda.LDADriver -i hdfs://master/lda/input/vectors -o hdfs://master/lda/output -k 20 -v 10000 --maxIter 40 where lda/input/vectors is the vectors file generated from the stand alone build-reuters.sh example. I can only get a single map task to execute while approx. 57 task slots are available. Has anyone actually ran distributed LDA successfully? This will help me figure out if I have a hadoop config issue or if there is an actual algorithm implementation problem. The Hadoop examples run successfully in distributed mode utilizing all available map tasks. I'm not sure if there is an issue with the InputSplit for the SequenceFile or something else... Any help is appreciated. Chad
