[ 
https://issues.apache.org/jira/browse/SOLR-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018185#comment-18018185
 ] 

Sonu Sharma commented on SOLR-17319:
------------------------------------

{quote} The RRF algorithm in Sonu's PR is fundamentally only applied at each 
shard for it's contribution to the whole
{quote}
The RRF algorithm is true in its meaning when two/more sets have some 
intersection with each other; otherwise, it becomes interleaving. Since the 
documents across the shards have no docs are in common, it is likely to behave 
as interleaving.

I can see that there are two ways to merge the results from many queries 
distributed over shards:
 # Execute all the queries on each of the shards, combine the results of all 
the queries per shard, preferably RRF (due to the docs intersect) and then, 
finally combine the per-shard results (RRF, which would become interleaving OR 
sorting both the shard scores as proposed by Ilan)
 # Execute each query to all the shards, combine the results of each shard per 
query (RRF, which would become interleaving because docs don't intersect OR 
sorting both the shard scores) and then finally merging the per query results 
using preferably RRF.

My implementation is the first one, which I think was relatively easier than 2. 
In both scenarios, we have to deal with combining the docs which do not 
intersect and have absolute scores. 

If we really want to avoid interleaving, then we can follow the same way the 
docs across the shards are merged, which is absolute scores belonging to each 
shard. [aligned with this comment - "Given we accept such inconsistencies with 
pure keyword search when merging results by comparing the scores computed by 
different shards, maybe it's also acceptable for hybrid search?"]

> Introduce support for Reciprocal Rank Fusion (combining queries)
> ----------------------------------------------------------------
>
>                 Key: SOLR-17319
>                 URL: https://issues.apache.org/jira/browse/SOLR-17319
>             Project: Solr
>          Issue Type: New Feature
>          Components: vector-search
>    Affects Versions: 9.6.1
>            Reporter: Alessandro Benedetti
>            Assignee: Alessandro Benedetti
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 23h 10m
>  Remaining Estimate: 0h
>
> Reciprocal Rank Fusion (RRF) is an algorithm that takes in input multiple 
> ranked lists to produce a unified result set. 
> Examples of use cases where RRF can be used include hybrid search and 
> multiple Knn vector queries executed concurrently. 
> RRF is based on the concept of reciprocal rank, which is the inverse of the 
> rank of a document in a ranked list of search results. 
> The combination of search results happens taking into account the position of
>  the items in the original rankings, and giving higher score to items that 
> are ranked higher in multiple lists. RRF was introduced the first time by 
> Cormack et al. in [1].
> The syntax proposed:
> JSON Request
> {code:json}
> {
>     "queries": {
>         "lexical1": {
>             "lucene": {
>                 "query": "id:(10^=2 OR 2^=1 OR 4^=0.5)"
>             }
>         },
>         "lexical2": {
>             "lucene": {
>                 "query": "id:(2^=2 OR 4^=1 OR 3^=0.5)"
>             }
>         }
>     },
>     "limit": 10,
>     "fields": "[id,score]",
>     "params": {
>         "combiner": true,
>         "combiner.upTo": 5,
>         "facet": true,
>         "facet.field": "id",
>         "facet.mincount": 1
>     }
> }
> {code}
> [1] Cormack, Gordon V. et al. “Reciprocal rank fusion outperforms condorcet 
> and individual rank learning methods.” Proceedings of the 32nd international 
> ACM SIGIR conference on Research and development in information retrieval 
> (2009)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to