dsmiley commented on code in PR #4277:
URL: https://github.com/apache/solr/pull/4277#discussion_r3083783406


##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -127,4 +201,72 @@ public static QueryAndResponseCombiner getImplementation(
     throw new SolrException(
         SolrException.ErrorCode.BAD_REQUEST, "Unknown Combining algorithm: " + 
algorithm);
   }
+
+  /**
+   * A {@link DoubleValuesSource} that returns pre-computed scores for 
specific document IDs. Used
+   * with {@link FunctionScoreQuery#boostByValue} to assign original combined 
sub-query scores to a
+   * DocSetQuery, enabling the collapse filter to select group heads based on 
these scores.
+   */
+  private static class PrecomputedScoreValuesSource extends DoubleValuesSource 
{
+
+    private final IntDoubleHashMap scoreMap;
+
+    PrecomputedScoreValuesSource(IntDoubleHashMap scoreMap) {
+      this.scoreMap = scoreMap;
+    }
+
+    @Override
+    public DoubleValues getValues(LeafReaderContext ctx, DoubleValues 
existing) {
+      int base = ctx.docBase;
+      return new DoubleValues() {
+        private double currentScore;
+
+        @Override
+        public double doubleValue() {
+          return currentScore;
+        }
+
+        @Override
+        public boolean advanceExact(int doc) {
+          int globalDoc = base + doc;
+          double score = scoreMap.get(globalDoc);
+          if (score != 0) {
+            currentScore = score;
+            return true;
+          }
+          return false;

Review Comment:
   note: by returning false here... docs in the DocSet but not in the score 
will score as 0.  I suppose that's fine.



##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -49,12 +62,21 @@ public abstract List<ShardDoc> combine(
       Map<String, List<ShardDoc>> queriesDocMap, SolrParams solrParams);
 
   /**
-   * Simple combine query result list as a union.
+   * Combine query result list as a union, optionally deduplicating by a 
collapse field. When a
+   * collapse filter is provided, only one document per unique field value is 
kept (based on the
+   * collapse sort/score selection). This ensures that collapse semantics are 
preserved across
+   * combined queries.
    *
    * @param queryResults the query results to be combined
+   * @param collapseFilter the collapse post filter, or null if no collapse 
dedup is needed
+   * @param searcher the searcher to read field values from, required when 
collapseFilter is
+   *     non-null
    * @return the combined query result
    */
-  public static QueryResult simpleCombine(List<QueryResult> queryResults) {
+  public static QueryResult simpleCombine(
+      List<QueryResult> queryResults,
+      CollapsingPostFilter collapseFilter,
+      SolrIndexSearcher searcher) {
     QueryResult combinedQueryResults = new QueryResult();
     DocSet combinedDocSet = null;

Review Comment:
   I think combinedDocSet could still be null through this if the nature of the 
requests don't require it (e.g. no faceting).  The code you add add in this PR 
assumes it's non-null to get a Query from it.  Assuming you truly need it for 
collapsing (?), you can tell Solr for each subquery to create it.



##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -87,14 +122,53 @@ public static QueryResult simpleCombine(List<QueryResult> 
queryResults) {
             combinedResultsLength,
             combinedResultsDocIds,
             combinedResultScores,
-            Math.max(combinedResultsLength, totalMatches),
+            Math.max(combinedResultsLength, totalMatches - removedByCollapse),
             combinedResultScores.length > 0 ? combinedResultScores[0] : 0,
             TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO);
     combinedQueryResults.setDocList(combinedResultSlice);
     combinedQueryResults.setDocSet(combinedDocSet);
     return combinedQueryResults;
   }
 
+  /**
+   * Removes collapsed duplicates by delegating to SolrIndexSearcher with the 
CollapsingPostFilter.
+   * Uses a DocSetQuery wrapped with FunctionScoreQuery.boostByValue to 
preserve original scores
+   * from combined sub-queries, then applies the collapse filter to determine 
surviving doc IDs.
+   *
+   * <p>This leverages Solr's native collapse infrastructure instead of 
manually reading DocValues.
+   */
+  private static DocSet removeCollapsedDuplicatesViaSearcher(
+      CollapsingPostFilter collapseFilter,
+      SolrIndexSearcher searcher,
+      Map<Integer, Float> uniqueDocIds,

Review Comment:
   Those are only top-X docs and scores, right?  



##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -87,14 +122,53 @@ public static QueryResult simpleCombine(List<QueryResult> 
queryResults) {
             combinedResultsLength,
             combinedResultsDocIds,
             combinedResultScores,
-            Math.max(combinedResultsLength, totalMatches),
+            Math.max(combinedResultsLength, totalMatches - removedByCollapse),
             combinedResultScores.length > 0 ? combinedResultScores[0] : 0,
             TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO);
     combinedQueryResults.setDocList(combinedResultSlice);
     combinedQueryResults.setDocSet(combinedDocSet);
     return combinedQueryResults;
   }
 
+  /**
+   * Removes collapsed duplicates by delegating to SolrIndexSearcher with the 
CollapsingPostFilter.
+   * Uses a DocSetQuery wrapped with FunctionScoreQuery.boostByValue to 
preserve original scores
+   * from combined sub-queries, then applies the collapse filter to determine 
surviving doc IDs.
+   *
+   * <p>This leverages Solr's native collapse infrastructure instead of 
manually reading DocValues.
+   */
+  private static DocSet removeCollapsedDuplicatesViaSearcher(
+      CollapsingPostFilter collapseFilter,
+      SolrIndexSearcher searcher,
+      Map<Integer, Float> uniqueDocIds,
+      DocSet combinedDocSet) {

Review Comment:
   is this absolutely all docs matched, not just top-X?



##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -87,14 +122,53 @@ public static QueryResult simpleCombine(List<QueryResult> 
queryResults) {
             combinedResultsLength,
             combinedResultsDocIds,
             combinedResultScores,
-            Math.max(combinedResultsLength, totalMatches),
+            Math.max(combinedResultsLength, totalMatches - removedByCollapse),
             combinedResultScores.length > 0 ? combinedResultScores[0] : 0,
             TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO);
     combinedQueryResults.setDocList(combinedResultSlice);
     combinedQueryResults.setDocSet(combinedDocSet);
     return combinedQueryResults;
   }
 
+  /**
+   * Removes collapsed duplicates by delegating to SolrIndexSearcher with the 
CollapsingPostFilter.
+   * Uses a DocSetQuery wrapped with FunctionScoreQuery.boostByValue to 
preserve original scores

Review Comment:
   I wouldn't document in javadocs an implementation detail like this.  Details 
can get out of sync and the caller shouldn't care.  Document for what the 
caller shoud know.  Like... you're actually removing values from one of the 
data structures passed in (uniqueDocIds) -- that's very notable.



##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -127,4 +201,72 @@ public static QueryAndResponseCombiner getImplementation(
     throw new SolrException(
         SolrException.ErrorCode.BAD_REQUEST, "Unknown Combining algorithm: " + 
algorithm);
   }
+
+  /**
+   * A {@link DoubleValuesSource} that returns pre-computed scores for 
specific document IDs. Used
+   * with {@link FunctionScoreQuery#boostByValue} to assign original combined 
sub-query scores to a
+   * DocSetQuery, enabling the collapse filter to select group heads based on 
these scores.
+   */
+  private static class PrecomputedScoreValuesSource extends DoubleValuesSource 
{
+
+    private final IntDoubleHashMap scoreMap;

Review Comment:
   its appreciated to clarify (either in naming or javadocs or comments) the 
key & value of maps.  Here, perhaps `global doc ID -> score` or say naming this 
scoreByDoc or docToScore



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to