On Jul 1, 2009, at 1:39 AM, Ganesh wrote:

Thanks for your reply.

My requirement is to fetch the list of top frequency terms indexed in a day. I used the logic said in the article (refer below link)
http://stackoverflow.com/questions/195434/how-can-i-get-top-terms-for-a-subset-of-documents-in-a-lucene-index

I enabled term vector for a field. Indexed the content and i am able to retrieve the list of top indexed term in a day / date range.

When IndexReader/ Searcher is opened, whether it will load all term vector frequncies?

No, it won't. Term Vecs are stored on disk much like the stored fields.


Consider i have enabled this option and indexed say 5GB, Now i don't want the Reader / Searcher to load term vector. I want to switch off
this feature? Is that possible without re-indexing?

I suppose. Although the approach you are using seems to rely on a custom Collector, which means you need to not use that one.

Storing Term Vecs will indeed make your index much bigger, but it shouldn't effect memory much, unless you are caching, which probably isn't a bad idea anyway.




Regards
Ganesh

----- Original Message -----
From: "Grant Ingersoll" <gsing...@apache.org>
To: <java-user@lucene.apache.org>
Sent: Tuesday, June 30, 2009 9:48 PM
Subject: Re: Term Frequency vector consumes memory


In Lucene, a Term Vector is a specific thing that is stored on disk
when creating a Document and Field.  It is optional and off by
default.  It is separate from being able to get the term frequencies
for all the docs in a specific field.  The former is decided at
indexing time and there is no way to remove it w/o reindexing.
Furthermore, it is not loaded into memory by the IndexReader.  Term
Frequencies are accessed via the TermDocs.

Can you clarify a bit more what you are looking to do?  Perhaps some
sample code will help demonstrate what you'd like to turn off, as I am
not clear on your question.

Cheers,
Grant

On Jun 30, 2009, at 3:37 AM, Ganesh wrote:

At the end of the day, I used to build the stats of top indexed
terms. I enabled term frequency for the single field. It is working
fine. I could able to get the top terms and its frequencies. It
consumes huge amount of RAM. My index size is 5 GB and has 8 million
records. If i didn't enable term vector then i could do index up to
17 GB with 40 million records.

When IndexReader/ Searcher is opened, whether it will load all term
vector frequncies?

Consider i have enabled this option and indexed say 5GB, Now i don't
want the Reader / Searcher to load term vector. I want to switch off
this feature? Is that possible without re-indexing?

Regards
Ganesh
Send instant messages to your online friends http://in.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Send instant messages to your online friends http://in.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to