[ 
https://issues.apache.org/jira/browse/SOLR-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18019352#comment-18019352
 ] 

Christine Poerschke commented on SOLR-17319:
--------------------------------------------

So I was curious about how overall RRF might work and continued a little more 
with my earlier [https://github.com/apache/solr/pull/2597] scribbles with 
[https://github.com/apache/solr/pull/3648] as the continuation.

Some notes here:
 * {{CompoundQueryComponentTest}} uses the cloud-dynamic config set for 
illustration with 1 or 3 shards and invariant {{shortCircuit=false}} to ensure 
that even the single-shard scenario uses the distributed code paths.

 * The CompoundQueryComponent (CQC) delegates to the QueryComponent (QC) for 
the sub-queries and this must include the sub-response-builders 
({{{}CompoundResponseBuilder.Inner{}}}) propagating the request addition back 
to the top-level response-builder {{{}CompoundResponseBuilder{}}}.

 * However, shard responses must be handled only by whoever made the request: 
this tracking-and-matching is done via the injection of a {{rrf.q.key}} 
parameter (when the request addition is propagated in the 
{{CompoundResponseBuilder.Inner.addRequest}} method).
On the shard, the {{rrf.q.key}} parameter is also used to detect that it is a 
rrf-distributed request and that therefore not the absent {{q}} but 'something 
else' should be used: if {{rrf.q.1=hello:foo}} and {{rrf.q.2=goodbye:bar}} are 
the sub-queries then the first sub-query to the shards will have 
{{rrf.q.key=rrf.q.1}} and so {{hello:foo}} will be used.

 * Once all the sub-requests are fully handled we have another stage, called 
(say) {{STAGE_FUSION}} and the demo just concatenates.

 * The 
[https://github.com/apache/solr/pull/3648/commits/98a4f851b8eaa692af2331eae53098ad4644ff04]
 and 
[https://github.com/apache/solr/pull/3648/commits/48cafb95c1d57b469c1c5ceb5c467be335ad2b39]
 commits are a little sneaky to allow the 'q' parameter to be swapped out 
as-far-as the query component is concerned and also to let the compound query 
component declare the sub-queries done but then to progress to the fusion stage 
after that.
 ** Perhaps one might also want a way to optionally 'swap out' e.g. the 'sort' 
parameter e.g. {{rrf.sort.1=time_of_publication+desc}} and 
{{rrf.sort.2=popularity+desc}} but that's beyond scribbles level.

All just scribbles obviously, so sharing as-is in case it helps somehow maybe.

> Introduce support for Reciprocal Rank Fusion (combining queries)
> ----------------------------------------------------------------
>
>                 Key: SOLR-17319
>                 URL: https://issues.apache.org/jira/browse/SOLR-17319
>             Project: Solr
>          Issue Type: New Feature
>          Components: vector-search
>    Affects Versions: 9.6.1
>            Reporter: Alessandro Benedetti
>            Assignee: Alessandro Benedetti
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 24h 20m
>  Remaining Estimate: 0h
>
> Reciprocal Rank Fusion (RRF) is an algorithm that takes in input multiple 
> ranked lists to produce a unified result set. 
> Examples of use cases where RRF can be used include hybrid search and 
> multiple Knn vector queries executed concurrently. 
> RRF is based on the concept of reciprocal rank, which is the inverse of the 
> rank of a document in a ranked list of search results. 
> The combination of search results happens taking into account the position of
>  the items in the original rankings, and giving higher score to items that 
> are ranked higher in multiple lists. RRF was introduced the first time by 
> Cormack et al. in [1].
> The syntax proposed:
> JSON Request
> {code:json}
> {
>     "queries": {
>         "lexical1": {
>             "lucene": {
>                 "query": "id:(10^=2 OR 2^=1 OR 4^=0.5)"
>             }
>         },
>         "lexical2": {
>             "lucene": {
>                 "query": "id:(2^=2 OR 4^=1 OR 3^=0.5)"
>             }
>         }
>     },
>     "limit": 10,
>     "fields": "[id,score]",
>     "params": {
>         "combiner": true,
>         "combiner.upTo": 5,
>         "facet": true,
>         "facet.field": "id",
>         "facet.mincount": 1
>     }
> }
> {code}
> [1] Cormack, Gordon V. et al. “Reciprocal rank fusion outperforms condorcet 
> and individual rank learning methods.” Proceedings of the 32nd international 
> ACM SIGIR conference on Research and development in information retrieval 
> (2009)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to