[ https://issues.apache.org/jira/browse/LUCENE-9335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17326487#comment-17326487 ]
Adrien Grand commented on LUCENE-9335: -------------------------------------- I'd be interested in seeing how the results look like with 5+ clauses indeed. And that also makes me curious about whether we could do better with a BulkScorer. Currently your PR needs to put a lot of code in {{doAdvance}} to reason about max scores, check if we need to move scorers from the list of essential scorers to the list of optional scorers or not, etc. I think that this code has non-negligible overhead since it's called on every match. A BulkScorer would make it easier to only do this sort of things on periodic checkpoints. This is the trick that {{BooleanScorer}} uses: in order to not have to keep a heap constantly ordered, it scores windows of 2048 documents at a time. > Add a bulk scorer for disjunctions that does dynamic pruning > ------------------------------------------------------------ > > Key: LUCENE-9335 > URL: https://issues.apache.org/jira/browse/LUCENE-9335 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Adrien Grand > Priority: Minor > Time Spent: 2h > Remaining Estimate: 0h > > Lucene often gets benchmarked against other engines, e.g. against Tantivy and > PISA at [https://tantivy-search.github.io/bench/] or against research > prototypes in Table 1 of > [https://cs.uwaterloo.ca/~jimmylin/publications/Grand_etal_ECIR2020_preprint.pdf]. > Given that top-level disjunctions of term queries are commonly used for > benchmarking, it would be nice to optimize this case a bit more, I suspect > that we could make fewer per-document decisions by implementing a BulkScorer > instead of a Scorer. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org