> Given a term say "apache", I want to look up the lucene index
> programmatically to find out its frequency in the corpus.
I think you are asking collection frequency of a term. Term Frequency is
defined between a document and a term which is printed in the loop in the
following code. And at the end there is collection freq. which sum of tfs.
String path = "E:\\ThesaurusSolrHome\\data\\index";
String field = "contents";
Term term = new Term(field, "apache");
IndexReader indexReader = IndexReader.open(path);
TermDocs termDocs = indexReader.termDocs(term);
int collectionFreq = 0;
while (termDocs.next()) {
System.out.print("Document " + termDocs.doc() + " contains the term
" + term.text() + " ");
System.out.println(termDocs.freq() + " times");
collectionFreq += termDocs.freq();
}
indexReader.close();
System.out.println("Collection frequency of " + term.text() + " = " +
collectionFreq);
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]