[
https://issues.apache.org/jira/browse/SOLR-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018133#comment-18018133
]
David Smiley edited comment on SOLR-17319 at 9/4/25 2:31 PM:
-------------------------------------------------------------
bq. (Ilan said): Would it be more acceptable to base per shard result fusion on
something other than rank (i.e. not using RRF) so that the combined final
result list built from the results contributed by individual shards is "more
correct"? Vector similarity is an absolute value that has real meaning but the
score of a keyword query for a given doc depends on the other docs in the
corpus, so even basing fusion on scores would not be totally correct. Given we
accept such inconsistencies with pure keyword search when merging results by
comparing the scores computed by different shards, maybe it's also acceptable
for hybrid search?
I was thinking the same but ultimately came around to: Why not just do what we
_really_ want -- holistic RRF and not settle for less? IMO that's not more
difficult than the effort Sonu has already sunk into this implementation.
bq. (Hossman said): The RankQuery abstraction
I'm doubtful this is the right place to encapsulate higher level distributed
search interaction. But maybe; I haven't thought about it too deeply.
was (Author: dsmiley):
bq. (Ilan said): Would it be more acceptable to base per shard result fusion on
something other than rank (i.e. not using RRF) so that the combined final
result list built from the results contributed by individual shards is "more
correct"?
Vector similarity is an absolute value that has real meaning but the score of a
keyword query for a given doc depends on the other docs in the corpus, so even
basing fusion on scores would not be totally correct. Given we accept such
inconsistencies with pure keyword search when merging results by comparing the
scores computed by different shards, maybe it's also acceptable for hybrid
search?
I was thinking the same but ultimately came around to: Why not just do what we
_really_ want -- holistic RRF and not settle for less? IMO that's not more
difficult than the effort Sonu has already sunk into this implementation.
bq. (Hossman said): The RankQuery abstraction
I'm doubtful this is the right place to encapsulate higher level distributed
search interaction. But maybe; I haven't thought about it too deeply.
> Introduce support for Reciprocal Rank Fusion (combining queries)
> ----------------------------------------------------------------
>
> Key: SOLR-17319
> URL: https://issues.apache.org/jira/browse/SOLR-17319
> Project: Solr
> Issue Type: New Feature
> Components: vector-search
> Affects Versions: 9.6.1
> Reporter: Alessandro Benedetti
> Assignee: Alessandro Benedetti
> Priority: Major
> Labels: pull-request-available
> Time Spent: 22.5h
> Remaining Estimate: 0h
>
> Reciprocal Rank Fusion (RRF) is an algorithm that takes in input multiple
> ranked lists to produce a unified result set.
> Examples of use cases where RRF can be used include hybrid search and
> multiple Knn vector queries executed concurrently.
> RRF is based on the concept of reciprocal rank, which is the inverse of the
> rank of a document in a ranked list of search results.
> The combination of search results happens taking into account the position of
> the items in the original rankings, and giving higher score to items that
> are ranked higher in multiple lists. RRF was introduced the first time by
> Cormack et al. in [1].
> The syntax proposed:
> JSON Request
> {code:json}
> {
> "queries": {
> "lexical1": {
> "lucene": {
> "query": "id:(10^=2 OR 2^=1 OR 4^=0.5)"
> }
> },
> "lexical2": {
> "lucene": {
> "query": "id:(2^=2 OR 4^=1 OR 3^=0.5)"
> }
> }
> },
> "limit": 10,
> "fields": "[id,score]",
> "params": {
> "combiner": true,
> "combiner.upTo": 5,
> "facet": true,
> "facet.field": "id",
> "facet.mincount": 1
> }
> }
> {code}
> [1] Cormack, Gordon V. et al. “Reciprocal rank fusion outperforms condorcet
> and individual rank learning methods.” Proceedings of the 32nd international
> ACM SIGIR conference on Research and development in information retrieval
> (2009)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]