Thanks. This works well. The logic is 1. Do the search, For every document get the list of terms and its frequency. 2. Use SortedTermVectorMapper to generate a list of unique terms and its frequency. 2. Sort them to get the list of top numbered frequently indexed terms in a given date range (any given criteria).
My Question is: I need to get the top 20 highly indexed term in a day. 1 million documents could be indexed in a day. I need to traverse the 1 million records and store the unique terms and its frequencies. It may consume huge amount of memory. Is there any other way out? With out using term vector, i could get the list of most frequently indexed term in a database. Similarly is there any other way to get the list of most frequently indexed term in a date range or a subset of database. Regards Ganesh ----- Original Message ----- From: "Preetham Kajekar" <preet...@cisco.com> To: <java-user@lucene.apache.org> Sent: Tuesday, May 26, 2009 11:08 PM Subject: Re: Most frequently indexed term > Have a look at > http://stackoverflow.com/questions/195434/how-can-i-get-top-terms-for-a-subset-of-documents-in-a-lucene-index > > (I have not tried the above out) > > Ganesh wrote: >> Hello All, >> >> I need to build some stats. I need to know Top 5 frequently indexed term in >> a date range (In a day or a Month). >> >> Any idea of how to achieve this. >> >> Regards >> GaneshIéÝŠ{-j{fzË�ë-£*.®‰åŠwŸ®'§vÈm¶ŸÿŠyž²Ç§�êòj(com= > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org >IéÝŠ{-j{fzË�ë-£*.®‰åŠwŸ®'§vÈm¶ŸÿŠyž²Ç§�êòj(r‰