Add new API method to retrieve document field data in a batch
-------------------------------------------------------------
Key: LUCENE-1034
URL: https://issues.apache.org/jira/browse/LUCENE-1034
Project: Lucene - Java
Issue Type: Improvement
Components: Search
Affects Versions: 2.2
Environment: JDK 1.5.X, Linux & FreeBSD
Reporter: Michael Klatt
Priority: Minor
Attachments: FieldsReader.java.patch, IndexReader.java.patch,
MultiReader.java.patch, SegmentReader.java.patch
I've read in many forums about people who need to retrieve document data for a
large number of search results. In our case, we need to retrieve up to 10,000
results (sometimes more) from an index of over 100 million documents (our index
is about 65 GB). This can sometimes take a couple minutes.
In one of my attempts to improve performance, I modified the IndexReader
interface to provide a method which looks like:
public Document[] documents(int[] n, FieldSelector fieldSelector);
Instead of retrieving document data one at a time, I would request data for
many document numbers in one shot. The idea was to optimize the seeks on disk
so that in the FieldsReader, the seeks for the indexStream would be done first,
then all the seeks in the fieldStream would be completed. For a large number
of documents, this yielded a 20% speed improvement. The improvement was not as
much as I was looking for, but I felt that the improvement was significant
enough that I would request changes to the IndexReader interface.
I'm providing patches for the files that I needed to change for our
application. These patches are against the 2.2 release.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]