jpountz opened a new pull request, #12589:
URL: https://github.com/apache/lucene/pull/12589
The idea behind MAXSCORE is to run disjunctions as `+(essentialClause1 ...
essentialClauseM) nonEssentialClause1 ... nonEssentialClauseN`, moving more and
more clauses from the essential list to the non-essential list as the minimum
competitive score increases. For instance, a query such as `the book of life`
which I found in the Tantivy benchmark ends up running as `+book the of life`
after some time, ie. with one required clause and other clauses optional. This
is because matching `the`, `of` and `life` alone is not good enough for
yielding a match.
Here some statistics in that case:
- min competitive score: 3.4781857
- max_window_score(book): 2.8796153
- max_window_score(life): 2.037863
- max_window_score(the): 0.103848875
- max_window_score(of): 0.19427927
Actually if you look at these statistics, we could do better, because a
match may only be competitive if it matches both `book` and `life`, so this
query could actually execute as `+book +life the of`, which may help evaluate
fewer documents compared to `+book the of life`. Especially if you enable
recursive graph bisection.
This is what this PR tries to achieve: in the event when there is a single
essential clause and matching all clauses but the best non-essential clause
cannot produce a competitive match, then the scorer will only evaluate
documents that match the intersection of the essential clause and the best
non-essential clause.
It's worth noting that this optimization would kick in very frequently on
2-clauses disjunctions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]