On Mar 5, 2010, at 9:25 AM, Claudio Martella wrote: > Thanks! > > I'll try with (a) and maybe some Dirichlet Process Clustering. I notice > that LDA needs also maxWords. In my understanding that's the length of > the dictionary.txt (the number of unique words in my vectors) i got from > lucene.vectors. Is that correct?
Yes, and I believe we still write the length to the front of the file. We should probably change LDA to just take in the Dict file and then have it read the entry list so that people don't have to bother looking this up, esp. now that the dictionary file is a SequenceFile. -Grant
