[
https://issues.apache.org/jira/browse/LUCENE-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564885#comment-17564885
]
Adrien Grand commented on LUCENE-10480:
---------------------------------------
I haven't tried to reproduce it but the steps you took by running on wikibigall
with the nightly tasks file sound good to me. Another thing that changes
performance sometimes is the doc ID order, were you using multiple indexing
threads maybe?
Ignoring the fact that we cannot reproduce the slowdown, if I try to think of
the main differences between WANDScorer and BlockMaxMaxscoreScorer for
AndHighOrMedMed, I think the main one is the way that {{advanceShallow}} is
computed. Conjunctions use block boundaries of the clause that has the lowest
cost, so this could explain why we are seeing a slowdown with AndHighOrMedMed
(since the conjunction uses block boundaries of OrMedMed) and not
AndMedOrHighHigh (since the conjunction uses block boundaries of Med). Maybe we
could explore other approaches for {{advanceShallow}} such as taking the
minimum block boundary across essential clauses only instead of all clauses.
> Specialize 2-clauses disjunctions
> ---------------------------------
>
> Key: LUCENE-10480
> URL: https://issues.apache.org/jira/browse/LUCENE-10480
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Adrien Grand
> Priority: Minor
> Time Spent: 7h 20m
> Remaining Estimate: 0h
>
> WANDScorer is nice, but it also has lots of overhead to maintain its
> invariants: one linked list for the current candidates, one priority queue of
> scorers that are behind, another one for scorers that are ahead. All this
> could be simplified in the 2-clauses case, which feels worth specializing for
> as it's very common that end users enter queries that only have two terms?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]