[ 
https://issues.apache.org/jira/browse/SOLR-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18018131#comment-18018131
 ] 

David Smiley commented on SOLR-17319:
-------------------------------------

To those not paying attention -- Sonu has a PR and lots of peer review.  
Through subclass extension and some additional hooks in QueryComponent, there's 
no negative/distracting impact to existing use-cases not using this opt-in 
subclass of QueryComponent.  We reviewers have been into the details but let's 
consider the big picture of this PR (something I wish I did immediately as a 
reviewer).  The RRF algorithm in Sonu's PR is fundamentally only applied at 
each shard for it's contribution to the whole.  The cross-shard merge is 
basically just an interleaving (not obvious looking at the code).  That's very 
unfortunate since many shards will degenerate to simply interleaving and not 
RRF.  I wish there was an initial design discussion before code / sunk-cost 
investment.  RRF of the whole is the ideal.  Ideally sharding would have no 
impact on the final results, notwithstanding per-shard term statistics (and 
there's the obscurely named ExactStatsCache to handle that).  A wholistic RRF 
could be implemented at a later date but I suspect that would amount to a 
rewrite of this implementation.  That kind of solution would mean doing 
distributed-search twice (once per sub-query) from a SearchComponent and then 
doing RRF _over that_.  It really wouldn't look much like the current code.

> Introduce support for Reciprocal Rank Fusion (combining queries)
> ----------------------------------------------------------------
>
>                 Key: SOLR-17319
>                 URL: https://issues.apache.org/jira/browse/SOLR-17319
>             Project: Solr
>          Issue Type: New Feature
>          Components: vector-search
>    Affects Versions: 9.6.1
>            Reporter: Alessandro Benedetti
>            Assignee: Alessandro Benedetti
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 22.5h
>  Remaining Estimate: 0h
>
> Reciprocal Rank Fusion (RRF) is an algorithm that takes in input multiple 
> ranked lists to produce a unified result set. 
> Examples of use cases where RRF can be used include hybrid search and 
> multiple Knn vector queries executed concurrently. 
> RRF is based on the concept of reciprocal rank, which is the inverse of the 
> rank of a document in a ranked list of search results. 
> The combination of search results happens taking into account the position of
>  the items in the original rankings, and giving higher score to items that 
> are ranked higher in multiple lists. RRF was introduced the first time by 
> Cormack et al. in [1].
> The syntax proposed:
> JSON Request
> {code:json}
> {
>     "queries": {
>         "lexical1": {
>             "lucene": {
>                 "query": "id:(10^=2 OR 2^=1 OR 4^=0.5)"
>             }
>         },
>         "lexical2": {
>             "lucene": {
>                 "query": "id:(2^=2 OR 4^=1 OR 3^=0.5)"
>             }
>         }
>     },
>     "limit": 10,
>     "fields": "[id,score]",
>     "params": {
>         "combiner": true,
>         "combiner.upTo": 5,
>         "facet": true,
>         "facet.field": "id",
>         "facet.mincount": 1
>     }
> }
> {code}
> [1] Cormack, Gordon V. et al. “Reciprocal rank fusion outperforms condorcet 
> and individual rank learning methods.” Proceedings of the 32nd international 
> ACM SIGIR conference on Research and development in information retrieval 
> (2009)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to