[ 
https://issues.apache.org/jira/browse/LUCENE-8875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16870679#comment-16870679
 ] 

Atri Sharma commented on LUCENE-8875:
-------------------------------------

Another thing to explore is to have a sleek set of arrays instead of ScoreDocs: 
[https://sbdevel.wordpress.com/2015/10/05/speeding-up-core-search/]

 

Maybe have a new implementation of a PQ using this idea, and a new Collector 
which uses the threshold sentinel filling + the new PQ? Only used for very 
large N?

> Should TopScoreDocCollector Always Populate Sentinel Values?
> ------------------------------------------------------------
>
>                 Key: LUCENE-8875
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8875
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Priority: Major
>
> TopScoreDocCollector always initializes HitQueue as the PQ implementation, 
> and instruct HitQueue to populate with sentinels. While this is a great 
> safety mechanism, for very large datasets where the query's selectivity is 
> high, the sentinel population can be redundant and can become a large enough 
> bottleneck in itself. Does it make sense to introduce a new parameter in 
> TopScoreDocCollector which uses a heuristic (say number of hits > 10k) and 
> does not populate sentinels?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to