Re: LDA and choice of number of topics

Grant Ingersoll Fri, 05 Mar 2010 06:34:46 -0800

On Mar 5, 2010, at 9:25 AM, Claudio Martella wrote:

> Thanks!
> 
> I'll try with (a) and maybe some Dirichlet Process Clustering. I notice
> that LDA needs also maxWords. In my understanding that's the length of
> the dictionary.txt (the number of unique words in my vectors) i got from
> lucene.vectors. Is that correct?


Yes, and I believe we still write the length to the front of the file.  We 
should probably change LDA to just take in the Dict file and then have it read 
the entry list so that people don't have to bother looking this up, esp. now 
that the dictionary file is a SequenceFile.

-Grant

Re: LDA and choice of number of topics

Reply via email to