Grant, I have an index with documents that have a text field containing document text, and a tag field containing tags associated with the document. I am trying to calculate the probability that a document contains a particular word and is tagged with a particular tag. This is related to a MoreLikeThis extension I was trying to write (http://issues.apache.org/jira/browse/LUCENE-1910)
Most of the time is spent in the loop iterating over the document tagged with the particular tag, and computing counts of terms across the documents. If the index contains millions of documents, it takes a while to compute the document,tag probabilities. Thanks, Thomas On Wed, Oct 14, 2009 at 8:15 AM, Grant Ingersoll <gsing...@apache.org> wrote: > > On Oct 12, 2009, at 10:46 PM, Thomas D'Silva wrote: > >> Hi, >> >> I am trying to compute the counts of terms of the documents returned by >> running a query using a TermVectorMapper. >> I was wondering if anyone knew if there was a faster way to do this rather >> than using a HashMap with a TermVectorMapper to store the counts of the >> terms and calling getTermFreqVector(). >> I do not require the term frequency within a document. >> > > I think that is as fast as its going to get unless you have some other > restrictions that would allow you to use a FieldCache. Can you describe > the bigger problem you are trying to solve? > > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org