The term entries are used to map the text to a position in the
Vector. So, the readDictionary is just loading up that mapping such
that when it examines the vector it can print out that term 14534 is
really "foobar", or whatever.
There may be an abstraction to be made here, but I'd have to dig a
little deeper into the code to say for sure.
On Sep 23, 2009, at 4:58 PM, Jack Tanner wrote:
The TermEntry constructor is (String term, int termIdx, int
docFreq). What's the point of termIdx? I see that it gets used for
an assert in LDAPrintTopics.java:readDictionary() , but it seems
redundant otherwise.
(Background: I'd like to generate vectors for LDA directly,
bypassing Lucene. Following o.a.m.utils.vectors.lucene.Driver, I see
that I need to generate a dictionary file for the "printing out top
terms per topic" step. This uses TermInfo, which contains lots of
TermEntry elements.)
_________________________________________________________________
Bing™ brings you maps, menus, and reviews organized in one place.
Try it now.
http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search