[ 
https://issues.apache.org/jira/browse/SOLR-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018717#comment-18018717
 ] 

David Smiley commented on SOLR-17319:
-------------------------------------

You wrote a lot and I recognize I'm not responding to what you said; sorry.  
But I hope this explanation clarifies:

People all over the place today are implementing RRF _over_ the search results 
from search engines / vector DBs.  That is to say, the ranked results (be it 
keyword or vector) out of that engine.  How that engine goes about doing its 
job is an implementation detail to that engine; the caller applying RRF does 
not care.  These engines need to scale to be competitive, and sharding 
accomplishes that _without changing the semantics (a query over a corpus of 
docs produces the same docs and scores)_.  Admittedly, this statement is not 
precisely true, it's approximately true:  Local term statistics are generally 
good enough if document-to-shard distribution is non-biased, plus there's the 
option of enabling distributed-IDF via 
[ExactStatsCache|https://solr.apache.org/guide/solr/latest/deployment-guide/solrcloud-distributed-requests.html#distributedidf]
 to almost eliminate differences.  +If we add RRF to Solr, users will be very 
disappointed if the results vary wildly based on sharding+.  They'd reasonably 
ask themselves: _jeesh_, I could do RRF on top and get accurate RRF; why can't 
Solr do this?

> Introduce support for Reciprocal Rank Fusion (combining queries)
> ----------------------------------------------------------------
>
>                 Key: SOLR-17319
>                 URL: https://issues.apache.org/jira/browse/SOLR-17319
>             Project: Solr
>          Issue Type: New Feature
>          Components: vector-search
>    Affects Versions: 9.6.1
>            Reporter: Alessandro Benedetti
>            Assignee: Alessandro Benedetti
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 23h 10m
>  Remaining Estimate: 0h
>
> Reciprocal Rank Fusion (RRF) is an algorithm that takes in input multiple 
> ranked lists to produce a unified result set. 
> Examples of use cases where RRF can be used include hybrid search and 
> multiple Knn vector queries executed concurrently. 
> RRF is based on the concept of reciprocal rank, which is the inverse of the 
> rank of a document in a ranked list of search results. 
> The combination of search results happens taking into account the position of
>  the items in the original rankings, and giving higher score to items that 
> are ranked higher in multiple lists. RRF was introduced the first time by 
> Cormack et al. in [1].
> The syntax proposed:
> JSON Request
> {code:json}
> {
>     "queries": {
>         "lexical1": {
>             "lucene": {
>                 "query": "id:(10^=2 OR 2^=1 OR 4^=0.5)"
>             }
>         },
>         "lexical2": {
>             "lucene": {
>                 "query": "id:(2^=2 OR 4^=1 OR 3^=0.5)"
>             }
>         }
>     },
>     "limit": 10,
>     "fields": "[id,score]",
>     "params": {
>         "combiner": true,
>         "combiner.upTo": 5,
>         "facet": true,
>         "facet.field": "id",
>         "facet.mincount": 1
>     }
> }
> {code}
> [1] Cormack, Gordon V. et al. “Reciprocal rank fusion outperforms condorcet 
> and individual rank learning methods.” Proceedings of the 32nd international 
> ACM SIGIR conference on Research and development in information retrieval 
> (2009)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to