[ 
https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326487#comment-17326487
 ] 

Adrien Grand commented on LUCENE-9335:
--------------------------------------

I'd be interested in seeing how the results look like with 5+ clauses indeed.

And that also makes me curious about whether we could do better with a 
BulkScorer. Currently your PR needs to put a lot of code in {{doAdvance}} to 
reason about max scores, check if we need to move scorers from the list of 
essential scorers to the list of optional scorers or not, etc. I think that 
this code has non-negligible overhead since it's called on every match. A 
BulkScorer would make it easier to only do this sort of things on periodic 
checkpoints.  This is the trick that {{BooleanScorer}} uses: in order to not 
have to keep a heap constantly ordered, it scores windows of 2048 documents at 
a time.

> Add a bulk scorer for disjunctions that does dynamic pruning
> ------------------------------------------------------------
>
>                 Key: LUCENE-9335
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9335
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Priority: Minor
>          Time Spent: 2h
>  Remaining Estimate: 0h
>
> Lucene often gets benchmarked against other engines, e.g. against Tantivy and 
> PISA at [https://tantivy-search.github.io/bench/] or against research 
> prototypes in Table 1 of 
> [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf].
>  Given that top-level disjunctions of term queries are commonly used for 
> benchmarking, it would be nice to optimize this case a bit more, I suspect 
> that we could make fewer per-document decisions by implementing a BulkScorer 
> instead of a Scorer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to