[jira] Commented: (LUCENE-2127) Improved large result handling

Aaron McCurry (JIRA) Tue, 05 Jan 2010 15:04:28 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12796898#action_12796898
 ]


Aaron McCurry commented on LUCENE-2127:
---------------------------------------

I have implemented a paging collector for 2.4 and I am porting it to 3.0 that 
allows for users to page to any depth in their results.

Depending on how you are using lucene it may work a little bit differently but 
essentially it does an initial search and collects X number of documents.  Once 
that page of results is exhausted, it re-searches feeding the last docid and 
score into the next page of collected documents.  Assuming that the documents 
are collected in order, if the score of the current docid being collected is 
less than the previous passes score then it may be collected.  If the scores 
are equal, then compare the docids, and if the current docid being scored is 
greater than the previous docid, then it may be collected.

Basically it throws out the previous page of results.

This can be a little bit difficult to explain, but using this technique and 
with our internal collector pages being set to 50,000 scoredocs we have a 
production system that allows users to page to the last page of their results 
no matter how large the results, and we have results in the 10s of millions at 
times.

I can provide my 2.4 collector, and/or my 3.0 (once complete), this solution 
may solve this issue.

> Improved large result handling
> ------------------------------
>
>                 Key: LUCENE-2127
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2127
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>
> Per 
> http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5,
>  it would be nice to offer some other Collectors that are better at handling 
> really large number of results.  This could be implemented in a variety of 
> ways via Collectors.  For instance, we could have a raw collector that does 
> no sorting and just returns the ScoreDocs, or we could do as Mike suggests 
> and have Collectors that have heuristics about memory tradeoffs and only 
> heapify when appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2127) Improved large result handling

Reply via email to