[ 
https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871508#comment-16871508
 ] 

Adrien Grand commented on LUCENE-8875:
--------------------------------------

I like pre-populating the hit queue mostly because it makes the collector code 
simpler and likely a bit faster. As a comparison TopFieldCollector can't 
pre-populate the hit queue, which forces it to have different code paths for 
the case that the priority queue is full (common path) or that the queue is not 
full yet. In general I'm seeing large number of hits as an abuse case.

> Should TopScoreDocCollector Always Populate Sentinel Values?
> ------------------------------------------------------------
>
>                 Key: LUCENE-8875
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8875
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Priority: Major
>
> TopScoreDocCollector always initializes HitQueue as the PQ implementation, 
> and instruct HitQueue to populate with sentinels. While this is a great 
> safety mechanism, for very large datasets where the query's selectivity is 
> high, the sentinel population can be redundant and can become a large enough 
> bottleneck in itself. Does it make sense to introduce a new parameter in 
> TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and 
> does not populate sentinels?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to