Have you looked into term vectors? I think they should fit your bill pretty neatly. Here's a nice blog post with helpful background info: http://blog.jpountz.net/post/41301889664/putting-term-vectors-on-a-diet

-Mike

On 8/19/2014 10:04 AM, Bianca Pereira wrote:
Hi everybody,

   I would like to know your suggestions to calculate Term Frequency in a
Lucene document. Currently I am using MultiFields.getTermDocsEnum,
iterating through the DocsEnum 'de' returned and getting the frequency with
de.freq() for the desired document.

   My solution gives me the result I want but I am having time issues. For
instance, I want to calculate the term frequency for a given term for N
documents in a sequence. Then, every time I have a new document I have to
retrieve exactly the same DocsEnum again and iterate until find the
document I want. Of course I cannot cache DocsEnum (yes, I did this huge
mistake) because it is an iterator.

  Do you have any suggestions on how I can get Term Frequency in a fast way?
The unique suggestion I had up to now was "Do it programatically, don't use
Lucene". Should be this the solution?

   Thank you.

   Regards,
   Bianca Pereira



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to