hack4chang commented on code in PR #13470:
URL: https://github.com/apache/lucene/pull/13470#discussion_r1632393474


##########
lucene/core/src/java/org/apache/lucene/search/TopDocs.java:
##########
@@ -350,4 +354,38 @@ private static TopDocs mergeAux(
       return new TopFieldDocs(totalHits, hits, sort.getSort());
     }
   }
+
+  /** Reciprocal Rank Fusion method. */
+  public static TopDocs rrf(int TopN, int k, TopDocs[] hits) {
+    Map<Integer, Float> rrfScore = new HashMap<>();
+    long minHits = Long.MAX_VALUE;
+    for (TopDocs topDoc : hits) {
+      minHits = Math.min(minHits, topDoc.totalHits.value);
+      Map<Integer, Float> scoreMap = new HashMap<>();
+      for (ScoreDoc scoreDoc : topDoc.scoreDocs) {
+        scoreMap.put(scoreDoc.doc, scoreDoc.score);
+      }
+
+      List<Map.Entry<Integer, Float>> scoreList = new 
ArrayList<>(scoreMap.entrySet());
+      scoreList.sort(Map.Entry.comparingByValue());
+
+      int rank = 1;
+      for (ScoreDoc scoreDoc : topDoc.scoreDocs) {
+        rrfScore.put(scoreDoc.doc, rrfScore.getOrDefault(scoreDoc.doc, 0.0f) + 
1.0f / (rank + k));
+        rank++;
+      }
+    }
+
+    List<Map.Entry<Integer, Float>> rrfScoreRank = new 
ArrayList<>(rrfScore.entrySet());
+    rrfScoreRank.sort(
+        Map.Entry.<Integer, Float>comparingByValue().reversed()); // Sort in 
descending order
+
+    ScoreDoc[] rrfScoreDocs = new ScoreDoc[Math.min(TopN, 
rrfScoreRank.size())];
+    for (int i = 0; i < rrfScoreDocs.length; i++) {
+      rrfScoreDocs[i] = new ScoreDoc(rrfScoreRank.get(i).getKey(), 
rrfScoreRank.get(i).getValue());

Review Comment:
   If we do the rrf by setting the unique key as docid and shardIndex, what 
would be the difference between TopDocs#rrf and TopDocs#merge? I think giving 
an example could express better. Suppose that we have two Shards, and we want 
to retrieve top 3 results from each shards and do rrf on top of them. There's 
three documents A, B and C. In Shard1, the top 3 is A -> B -> C. In Shard2, 
it's B -> C -> A. The original rrf method would calculate the rank by 
aggregating the docid, assume the constant k is 1. Top 3 results would be B 
```(1/(k+2) + 1/(k+1))``` - A ```(1/(k+1) + 1/(k+3))``` - C ```(1/(k+3) + 
1/(k+2))```.
   If we are going to consider the shardIndex as a unique key as well, how 
should the rrf rank to be presented.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to