Hi Lucene Users,
I am using the lucene indices to get term frequencies. I just wanted
to check with you about the
time it is taking to retrieve these term freq. Please suggest if I
can improve the code/index or if
this is expected. It takes 8 to 9 seconds to retrieve the term freq
values of all 1030 documents,
with an index size of ~530MB.
Another question I have is Do I need to have Field.Store.Yes to get
the term freq vector?
Index Details:
-------------------
Size: 532 MB,
1032 Documents with varying number of terms from 600 to 100,000
The field is indexed as Field.Store.YES,
Field.Index.TOKENIZED,Field.TermVector.WITH_POSITIONS_OFFSETS
Term Freq Retrieval Time Values:
-------------------------------------
The time ranges in 8 to 9 seconds
long s = System.currentTimeMillis();
TermFreqVector termFreqVector;
for (int i = 0; i < 1030; i++) {
if (!reader.isDeleted(i)) {
termFreqVector = reader.getTermFreqVector(i, field);
}
}
long l = System.currentTimeMillis();
Hardware and Memory Settings:
-------------------------------------------
-Xmx 2048m -XX:PermSize=16m -XX:MaxPermSize=128m
Dual 1800 MHz Optron on 32 bit Linux 2.6.15.2; Lucene 2.0.0.
How can I get better results? Can I?
Many thanks for your help.
-Amit
---------------------------------------------------------
Amit Kumar
Research Programmer
The Graduate School of Library and Information Science
University of Illinois, Urbana Champaign IL, 61820
phone: 217-333-4118 fax: 217-244-3302
---------------------------------------------------------