[
https://issues.apache.org/jira/browse/SOLR-17319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17855675#comment-17855675
]
Chris M. Hostetter commented on SOLR-17319:
-------------------------------------------
I've only half skimmed the PR associated with this jira, and the discussion
that has taken place there, but i echo [~dsmiley] 's concerns about adding more
"if special feature then" branch complexity to QueryComponent in ways that can
complicate the maintenance of the "common case" code paths if/when there are
alternative ways to add the same new functionality.
In particular, if my understanding of this new functionality is correct, the
current suggested approach doesn't seem to support returning correct results
across multiple shards? ... I'm *DEFINITELY* -1 to adding any features to
QueryComponent that only work in single shard mode.
My naive impression of this functionality is that it seems like something that
– if it can be implemented in a multi shard way – would make the most sense to
be implemented by refactoring/expanding the reranker logic...
* The R{{{}ankQuery{}}} abstraction already provides a TopDocsCollector for
the SOlrIndexSearcher - which is where a new Combiner query could implement a
collector that federated over other collectors to produce a final ranked topN
for the current shard
** So why not an alternative to the {{rerank}} parser that returns a a new
{{CombinerQuery}} that has multiple clauses (with metadata about their rank
limits) and ignores the main {{"q"}} query?
* the multi-pass rerank scaling functionality seems it could be a suitable
place to figure out the per-shard ranking of each document from the per-shard
topN across each of the combined queries in a second pass
If my impression is wrong, and there is no way to implement multi-shard RRF in
a two pass approach, then i would urge implementing this single pass approach
in a new (optional) SearchComponent that could be used instead of
{{QueryComponent}} -- w/o the existing baggage of {{QueryComponent}} (like
grouping, cursors, distributed topN, etc...) and w/o adding any _new_ baggage
to {{QueryComponent}}.
> Introduce support for Reciprocal Rank Fusion (combining queries)
> ----------------------------------------------------------------
>
> Key: SOLR-17319
> URL: https://issues.apache.org/jira/browse/SOLR-17319
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: query
> Affects Versions: 9.6.1
> Reporter: Alessandro Benedetti
> Assignee: Alessandro Benedetti
> Priority: Major
>
> Reciprocal Rank Fusion (RRF) is an algorithm that takes in input multiple
> ranked lists to produce a unified result set.
> Examples of use cases where RRF can be used include hybrid search and
> multiple Knn vector queries executed concurrently.
> RRF is based on the concept of reciprocal rank, which is the inverse of the
> rank of a document in a ranked list of search results.
> The combination of search results happens taking into account the position of
> the items in the original rankings, and giving higher score to items that
> are ranked higher in multiple lists. RRF was introduced the first time by
> Cormack et al. in [1].
> The syntax proposed:
> JSON Request
> {code:json}
> {
> "queries": {
> "lexical1": {
> "lucene": {
> "query": "id:(10^=2 OR 2^=1 OR 4^=0.5)"
> }
> },
> "lexical2": {
> "lucene": {
> "query": "id:(2^=2 OR 4^=1 OR 3^=0.5)"
> }
> }
> },
> "limit": 10,
> "fields": "[id,score]",
> "params": {
> "combiner": true,
> "combiner.upTo": 5,
> "facet": true,
> "facet.field": "id",
> "facet.mincount": 1
> }
> }
> {code}
> [1] Cormack, Gordon V. et al. “Reciprocal rank fusion outperforms condorcet
> and individual rank learning methods.” Proceedings of the 32nd international
> ACM SIGIR conference on Research and development in information retrieval
> (2009)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]