[jira] Commented: (LUCENE-2127) Improved large result handling

Grant Ingersoll (JIRA) Wed, 06 Jan 2010 05:50:26 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12797112#action_12797112
 ]


Grant Ingersoll commented on LUCENE-2127:
-----------------------------------------

bq. If your "large queue" is a list/array, then it has slots, and you just 
reference those slots when asking FieldComparator to compare.

I was just thinking of an array of ScoreDocs.  I'm writing the benchmark code 
right now.  With this approach, though, it seems like the FieldComparator 
doesn't quite work, b/c you have to pass in numHits, which then goes and 
allocates another array.  Seems like it would need to be modified to take in an 
already filled array.

bq. But... have you thought about theoretical cost of "true pqueue" vs 
"approximate pqueue"? I think in the worst case, where results are returned in 
precisely reverse sort order, so that you always fully turnover the queue, is 
tricky.

Yes, but haven't looked at implementing yet.

bq. But actually that 1/10K constant = 1/M, and so I think the approximate PQ 
works out to O(N) cost, which is in fact much cheaper. I think?

Sounds right to me.  Once we have a benchmarker in place that allows for 
replacement of the Collector, we can try all of this fun stuff out.  We also 
need a collection where we can return large numbers of results with scores (in 
other words, not just MatchAllDocsQuery, although that still might be valid)

> Improved large result handling
> ------------------------------
>
>                 Key: LUCENE-2127
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2127
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>
> Per 
> http://search.lucidimagination.com/search/document/350c54fc90d257ed/lots_of_results#fbb84bd297d15dd5,
>  it would be nice to offer some other Collectors that are better at handling 
> really large number of results.  This could be implemented in a variety of 
> ways via Collectors.  For instance, we could have a raw collector that does 
> no sorting and just returns the ScoreDocs, or we could do as Mike suggests 
> and have Collectors that have heuristics about memory tradeoffs and only 
> heapify when appropriate.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2127) Improved large result handling

Reply via email to