Hi Lucene Users,

I am using the lucene indices to get term frequencies. I just wanted to check with you about the time it is taking to retrieve these term freq. Please suggest if I can improve the code/index or if this is expected. It takes 8 to 9 seconds to retrieve the term freq values of all 1030 documents,
with an index size of ~530MB.

Another question I have is Do I need to have Field.Store.Yes to get the term freq vector?

Index Details:
Size: 532 MB,
1032 Documents with varying number of terms from 600 to 100,000
The field is indexed as Field.Store.YES, Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS

Term Freq Retrieval Time Values:

The time ranges in 8 to 9 seconds

 long s = System.currentTimeMillis();
 TermFreqVector termFreqVector;
    for (int i = 0; i < 1030; i++) {
      if (!reader.isDeleted(i)) {
       termFreqVector   = reader.getTermFreqVector(i, field);
    long l = System.currentTimeMillis();

Hardware and Memory Settings:
-Xmx 2048m -XX:PermSize=16m -XX:MaxPermSize=128m

Dual 1800 MHz Optron on 32 bit Linux; Lucene 2.0.0.

How can I get better results? Can I?

Many thanks for your help.

Amit Kumar
Research Programmer
The Graduate School of Library and Information Science
University of Illinois, Urbana Champaign IL, 61820
phone: 217-333-4118 fax: 217-244-3302

Reply via email to