Hi Lucene Users,

I am using the lucene indices to get term frequencies. I just wanted to check with you about the time it is taking to retrieve these term freq. Please suggest if I can improve the code/index or if this is expected. It takes 8 to 9 seconds to retrieve the term freq values of all 1030 documents,
with an index size of ~530MB.

Another question I have is Do I need to have Field.Store.Yes to get the term freq vector?

Index Details:
-------------------
Size: 532 MB,
1032 Documents with varying number of terms from 600 to 100,000
The field is indexed as Field.Store.YES, Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS


Term Freq Retrieval Time Values:
-------------------------------------

The time ranges in 8 to 9 seconds

 long s = System.currentTimeMillis();
 TermFreqVector termFreqVector;
    for (int i = 0; i < 1030; i++) {
      if (!reader.isDeleted(i)) {
       termFreqVector   = reader.getTermFreqVector(i, field);
       }
    }
    long l = System.currentTimeMillis();


Hardware and Memory Settings:
-------------------------------------------
-Xmx 2048m -XX:PermSize=16m -XX:MaxPermSize=128m

Dual 1800 MHz Optron on 32 bit Linux 2.6.15.2; Lucene 2.0.0.




How can I get better results? Can I?



Many thanks for your help.
-Amit





---------------------------------------------------------
Amit Kumar
Research Programmer
The Graduate School of Library and Information Science
University of Illinois, Urbana Champaign IL, 61820
phone: 217-333-4118 fax: 217-244-3302
---------------------------------------------------------




Reply via email to