SparseVectorsFromSequenceFiles only outputs a single vector file
----------------------------------------------------------------
Key: MAHOUT-397
URL: https://issues.apache.org/jira/browse/MAHOUT-397
Project: Mahout
Issue Type: Improvement
Components: Utils
Affects Versions: 0.3
Reporter: Jeff Eastman
Assignee: Jeff Eastman
Fix For: 0.4
When running LDA via build-reuters.sh on a 3-node Hadoop cluster, I've noticed
that there is only a single vector file produced by the utility preprocessing
steps. This means LDA (and other clustering too) can only use a single mapper
no matter how large the cluster is. Investigating, it seems that the program
argument (-nr) for setting the number of reducers - and hence the number of
output files - is not propagated to the final stages where the output vectors
are created.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.