[jira] Updated: (LUCENE-1034) Add new API method to retrieve document field data in a batch

Mark Miller (JIRA) Sun, 16 Nov 2008 16:17:07 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-1034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mark Miller updated LUCENE-1034:
--------------------------------

    Attachment: LUCENE-1034.patch

I've patched it altogether into one file. I like the idea, but right now, I 
don't like the amount of code duplication. Arguably, this could also be moved 
to the Searcher family, but could prob live without that. Also still needs a 
test, but I've lost interest unless the code dupe can be resolved while 
maintaining the speed gain.

> Add new API method to retrieve document field data in a batch
> -------------------------------------------------------------
>
>                 Key: LUCENE-1034
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1034
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.2
>         Environment: JDK 1.5.X, Linux & FreeBSD
>            Reporter: Michael Klatt
>            Priority: Minor
>         Attachments: FieldsReader.java.patch, IndexReader.java.patch, 
> LUCENE-1034.patch, MultiReader.java.patch, SegmentReader.java.patch
>
>
> I've read in many forums about people who need to retrieve document data for 
> a large number of search results. In our case, we need to retrieve up to 
> 10,000 results (sometimes more) from an index of over 100 million documents 
> (our index is about 65 GB).   This can sometimes take a couple minutes. 
> In one of my attempts to improve performance, I modified the IndexReader 
> interface to provide a method which looks like:
> public Document[] documents(int[] n, FieldSelector fieldSelector);
> Instead of retrieving document data one at a time, I would request data for 
> many document numbers in one shot.   The idea was to optimize the seeks on 
> disk so that in the FieldsReader, the seeks for the indexStream would be done 
> first, then all the seeks in the fieldStream would be completed.   For a 
> large number of documents, this yielded a 20% speed improvement.  The 
> improvement was not as much as I was looking for, but I felt that the 
> improvement was significant enough that I would request changes to the 
> IndexReader interface.
> I'm providing patches for the files that I needed to change for our 
> application.    These patches are against the 2.2 release.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Updated: (LUCENE-1034) Add new API method to retrieve document field data in a batch

Reply via email to