Re: Get the total term frequency vector of a specific field from the hit results

karl wettin Tue, 10 Apr 2007 09:07:08 -0700


10 apr 2007 kl. 17.48 skrev Sengly Heng:

We don't really know what your problem is. Explaining that rathern
than the solution you have thought of might render a couple of
alternate solutions. Perhaps something could be precalculated and
stored in the documents. Perhaps feature selection (reduction) of the
terms might do the trick for you. And so on.
I have a corpus of documents indexed with different fields.Approximatelyeach document indexed has an average of 30 fields. Each field hasabout 100
terms.
Normally, the hit will return less than 100 documents. For each ofthe 30fields of the documents, I have to calculate the top 35 keywordsfrom allthe documents as well as the top 30 popular keywords (the keywordsthat are
distributed in many documents - something like docFreq or IDF).

Right, but /why/ do you need these values? Do you present them asthey are, or do you use them for some secondary calculation? Thenwhat is the result of this secondary calculation?

Please let me know if you are still have some more questions.


I'll reask of of the questions I placed in my previous reply:

How slow is it, and how fast did you expect it to be?

Can you limit the evaulation to the top n documents?


--
karl

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Get the total term frequency vector of a specific field from the hit results

Reply via email to