Hi, Thank you for the answers. At the end I calculated the Topic Frequency using Java, getting the text, broken into tokens and calculating from there. It turns out to be around 6 times faster in my case (using cache). Only the document frequency I keep calculating using Lucene.
Regards, Bianca 2014-08-19 17:56 GMT+01:00 Tri Cao <tm...@me.com>: > Erick, Solr termfreq implementation also uses DocsEnum with the assumption > that freq are called on ascending > doc IDs which is valid when scoring from from the hit list. If freq is > requested for an out of order doc, a new > DocsEnum has to be created. > > Bianca, can you explain your use case in more details? What did you mean > by having a new document? A new > document is added to the index? Then you already have to reopen the > searcher/reader anyway to get a new > DocsEnum. > > On Aug 19, 2014, at 08:26 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > > Hmmm, I'm not at all an expert here, but Solr has a function > query "termfreq" that does what you're doing I think? I wonder > if the code for that function query would be a good place to > copy (or even make use of)? See TermFreqValueSource... > > Maybe not helpful at all, but... > Erick > > On Tue, Aug 19, 2014 at 7:04 AM, Bianca Pereira <aivykar...@gmail.com > > wrote: > > Hi everybody, > > > > I would like to know your suggestions to calculate Term Frequency > in a > > Lucene document. Currently I am using MultiFields.getTermDocsEnum, > > iterating through the DocsEnum 'de' returned and getting the > frequency with > > de.freq() for the desired document. > > > > My solution gives me the result I want but I am having time > issues. For > > instance, I want to calculate the term frequency for a given term > for N > > documents in a sequence. Then, every time I have a new document I > have to > > retrieve exactly the same DocsEnum again and iterate until find > the > > document I want. Of course I cannot cache DocsEnum (yes, I did > this huge > > mistake) because it is an iterator. > > > > Do you have any suggestions on how I can get Term Frequency in a > fast way? > > The unique suggestion I had up to now was "Do it programatically, > don't use > > Lucene". Should be this the solution? > > > > Thank you. > > > > Regards, > > Bianca Pereira > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >