Continuing my sweep through Mahout's clustering capabilities... In LDA, one of the input parameters is --numWords. I think this is supposed to be the total number of words seen in the collection, right? Thus, if I dumped Vectors from Lucene, for instance, the --numWords value should be the count of the number of values in the dictionary, right?
-Grant
