The term entries are used to map the text to a position in the Vector. So, the readDictionary is just loading up that mapping such that when it examines the vector it can print out that term 14534 is really "foobar", or whatever.

There may be an abstraction to be made here, but I'd have to dig a little deeper into the code to say for sure.


On Sep 23, 2009, at 4:58 PM, Jack Tanner wrote:


The TermEntry constructor is (String term, int termIdx, int docFreq). What's the point of termIdx? I see that it gets used for an assert in LDAPrintTopics.java:readDictionary() , but it seems redundant otherwise. (Background: I'd like to generate vectors for LDA directly, bypassing Lucene. Following o.a.m.utils.vectors.lucene.Driver, I see that I need to generate a dictionary file for the "printing out top terms per topic" step. This uses TermInfo, which contains lots of TermEntry elements.)
_________________________________________________________________
Bing™ brings you maps, menus, and reviews organized in one place. Try it now.
http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to