Re: Using TermVectorMapper to compute term frequency across documents

Thomas D'Silva Thu, 15 Oct 2009 07:04:36 -0700

Grant,

I have an index with documents that have a text field containing
document text, and a tag field containing tags associated with the
document. I am trying to calculate the probability that a document
contains a particular word and is tagged with a particular tag.
This is related to a MoreLikeThis extension I was trying to write
(http://issues.apache.org/jira/browse/LUCENE-1910)


Most of the time is spent in the loop iterating over the document
tagged with the particular tag, and computing counts of terms across
the documents. If the index contains millions of documents, it takes a
while to compute the document,tag probabilities.

Thanks,
Thomas


On Wed, Oct 14, 2009 at 8:15 AM, Grant Ingersoll <gsing...@apache.org> wrote:
>
> On Oct 12, 2009, at 10:46 PM, Thomas D'Silva wrote:
>
>> Hi,
>>
>> I am trying to compute the counts of terms of the documents returned by
>> running a query using a TermVectorMapper.
>> I was wondering if anyone knew if there was a faster way to do this rather
>> than using a HashMap with a TermVectorMapper to store the counts of the
>> terms and calling getTermFreqVector().
>> I do not require the term frequency within a document.
>>
>
> I think that is as fast as its going to get unless you have some other
> restrictions that would allow you to use a FieldCache.    Can you describe
> the bigger problem you are trying to solve?
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Using TermVectorMapper to compute term frequency across documents

Reply via email to