It didn't have term vectors.
On Sat, Dec 19, 2009 at 8:43 PM, Drew Farris <[email protected]> wrote: > Does the IndexFiles class store term vectors for the contents field? > If not, that could be the problem. > > Also, you can try dumping the vector file using > o.a.m.utils.vectors.VectorDumper in mahout-utils and taking a look to > see what's in there. > > Failing that, in mahout-examples, you can run ./bin/build-reuters.sh > -- that will generate a known good set of vectors and you can try > running clustering upon that. No need to let build-reuters.sh to > complete, watch stdout and kill it once the vectors are done because > it will start running lda and you're not really interested in that at > this point. Once this is run, the vectors themselves can be found in > work/vectors, dictionary in work/dict.txt (relative to the > mahout-example directory) > > On Sat, Dec 19, 2009 at 7:41 PM, Benson Margulies <[email protected]> > wrote: >> So, >> >> I took the stock Lucene 'IndexFiles' class. I modified it to read >> UTF-8. I ran it. >> >> I ran the following: >> >> java -cp $cp org.apache.mahout.utils.vectors.lucene.Driver --dir >> he_lucene_index \ >> --output he_mahout_vector --field contents --dictOut he_mahout_dict \ >> --idField path >> >> and am rewarded with a tiny file of vectors. Clearly I'm messing something >> up. >> >
