Re: Most frequently indexed term

Ganesh Mon, 08 Jun 2009 03:31:20 -0700

Thanks. This works well.

The logic is 
1. Do the search, For every document get the list of terms and its frequency. 
2. Use SortedTermVectorMapper to generate a list of unique terms and its 
frequency. 
2. Sort them to get the list of top numbered frequently indexed terms in a 
given date range (any given criteria).

My Question is:
I need to get the top 20 highly indexed term in a day. 1 million documents 
could be indexed in a day. I need to traverse the 1 million records and store 
the unique terms and its frequencies. It may consume huge amount of memory. Is 
there any other way out? With out using term vector, i could get the list of 
most frequently indexed term in a database. Similarly is there any other way to 
get the list of most frequently indexed term in a date range or a subset of 
database.

Regards
Ganesh 

----- Original Message ----- 
From: "Preetham Kajekar" <[email protected]>
To: <[email protected]>
Sent: Tuesday, May 26, 2009 11:08 PM
Subject: Re: Most frequently indexed term

> Have a look at
> http://stackoverflow.com/questions/195434/how-can-i-get-top-terms-for-a-subset-of-documents-in-a-lucene-index
> 
> (I have not tried the above out)
> 
> Ganesh wrote:
>> Hello All,
>>
>> I need to build some stats. I need to know Top 5 frequently indexed term in 
>> a date range (In a day or a Month).
>>
>> Any idea of how to achieve this.
>>
>> Regards
>> GaneshIéÝŠ{-j{fzË�ë-£*.®‰åŠwŸ®'§vÈm¶ŸÿŠyž²Ç§�êòj(com=
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>IéÝŠ{-j{fzË�ë-£*.®‰åŠwŸ®'§vÈm¶ŸÿŠyž²Ç§�êòj(r‰

Re: Most frequently indexed term

Reply via email to