dsmiley commented on code in PR #4277:
URL: https://github.com/apache/solr/pull/4277#discussion_r3083783406
##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -127,4 +201,72 @@ public static QueryAndResponseCombiner getImplementation(
throw new SolrException(
SolrException.ErrorCode.BAD_REQUEST, "Unknown Combining algorithm: " +
algorithm);
}
+
+ /**
+ * A {@link DoubleValuesSource} that returns pre-computed scores for
specific document IDs. Used
+ * with {@link FunctionScoreQuery#boostByValue} to assign original combined
sub-query scores to a
+ * DocSetQuery, enabling the collapse filter to select group heads based on
these scores.
+ */
+ private static class PrecomputedScoreValuesSource extends DoubleValuesSource
{
+
+ private final IntDoubleHashMap scoreMap;
+
+ PrecomputedScoreValuesSource(IntDoubleHashMap scoreMap) {
+ this.scoreMap = scoreMap;
+ }
+
+ @Override
+ public DoubleValues getValues(LeafReaderContext ctx, DoubleValues
existing) {
+ int base = ctx.docBase;
+ return new DoubleValues() {
+ private double currentScore;
+
+ @Override
+ public double doubleValue() {
+ return currentScore;
+ }
+
+ @Override
+ public boolean advanceExact(int doc) {
+ int globalDoc = base + doc;
+ double score = scoreMap.get(globalDoc);
+ if (score != 0) {
+ currentScore = score;
+ return true;
+ }
+ return false;
Review Comment:
note: by returning false here... docs in the DocSet but not in the score
will score as 0. I suppose that's fine.
##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -49,12 +62,21 @@ public abstract List<ShardDoc> combine(
Map<String, List<ShardDoc>> queriesDocMap, SolrParams solrParams);
/**
- * Simple combine query result list as a union.
+ * Combine query result list as a union, optionally deduplicating by a
collapse field. When a
+ * collapse filter is provided, only one document per unique field value is
kept (based on the
+ * collapse sort/score selection). This ensures that collapse semantics are
preserved across
+ * combined queries.
*
* @param queryResults the query results to be combined
+ * @param collapseFilter the collapse post filter, or null if no collapse
dedup is needed
+ * @param searcher the searcher to read field values from, required when
collapseFilter is
+ * non-null
* @return the combined query result
*/
- public static QueryResult simpleCombine(List<QueryResult> queryResults) {
+ public static QueryResult simpleCombine(
+ List<QueryResult> queryResults,
+ CollapsingPostFilter collapseFilter,
+ SolrIndexSearcher searcher) {
QueryResult combinedQueryResults = new QueryResult();
DocSet combinedDocSet = null;
Review Comment:
I think combinedDocSet could still be null through this if the nature of the
requests don't require it (e.g. no faceting). The code you add add in this PR
assumes it's non-null to get a Query from it. Assuming you truly need it for
collapsing (?), you can tell Solr for each subquery to create it.
##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -87,14 +122,53 @@ public static QueryResult simpleCombine(List<QueryResult>
queryResults) {
combinedResultsLength,
combinedResultsDocIds,
combinedResultScores,
- Math.max(combinedResultsLength, totalMatches),
+ Math.max(combinedResultsLength, totalMatches - removedByCollapse),
combinedResultScores.length > 0 ? combinedResultScores[0] : 0,
TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO);
combinedQueryResults.setDocList(combinedResultSlice);
combinedQueryResults.setDocSet(combinedDocSet);
return combinedQueryResults;
}
+ /**
+ * Removes collapsed duplicates by delegating to SolrIndexSearcher with the
CollapsingPostFilter.
+ * Uses a DocSetQuery wrapped with FunctionScoreQuery.boostByValue to
preserve original scores
+ * from combined sub-queries, then applies the collapse filter to determine
surviving doc IDs.
+ *
+ * <p>This leverages Solr's native collapse infrastructure instead of
manually reading DocValues.
+ */
+ private static DocSet removeCollapsedDuplicatesViaSearcher(
+ CollapsingPostFilter collapseFilter,
+ SolrIndexSearcher searcher,
+ Map<Integer, Float> uniqueDocIds,
Review Comment:
Those are only top-X docs and scores, right?
##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -87,14 +122,53 @@ public static QueryResult simpleCombine(List<QueryResult>
queryResults) {
combinedResultsLength,
combinedResultsDocIds,
combinedResultScores,
- Math.max(combinedResultsLength, totalMatches),
+ Math.max(combinedResultsLength, totalMatches - removedByCollapse),
combinedResultScores.length > 0 ? combinedResultScores[0] : 0,
TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO);
combinedQueryResults.setDocList(combinedResultSlice);
combinedQueryResults.setDocSet(combinedDocSet);
return combinedQueryResults;
}
+ /**
+ * Removes collapsed duplicates by delegating to SolrIndexSearcher with the
CollapsingPostFilter.
+ * Uses a DocSetQuery wrapped with FunctionScoreQuery.boostByValue to
preserve original scores
+ * from combined sub-queries, then applies the collapse filter to determine
surviving doc IDs.
+ *
+ * <p>This leverages Solr's native collapse infrastructure instead of
manually reading DocValues.
+ */
+ private static DocSet removeCollapsedDuplicatesViaSearcher(
+ CollapsingPostFilter collapseFilter,
+ SolrIndexSearcher searcher,
+ Map<Integer, Float> uniqueDocIds,
+ DocSet combinedDocSet) {
Review Comment:
is this absolutely all docs matched, not just top-X?
##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -87,14 +122,53 @@ public static QueryResult simpleCombine(List<QueryResult>
queryResults) {
combinedResultsLength,
combinedResultsDocIds,
combinedResultScores,
- Math.max(combinedResultsLength, totalMatches),
+ Math.max(combinedResultsLength, totalMatches - removedByCollapse),
combinedResultScores.length > 0 ? combinedResultScores[0] : 0,
TotalHits.Relation.GREATER_THAN_OR_EQUAL_TO);
combinedQueryResults.setDocList(combinedResultSlice);
combinedQueryResults.setDocSet(combinedDocSet);
return combinedQueryResults;
}
+ /**
+ * Removes collapsed duplicates by delegating to SolrIndexSearcher with the
CollapsingPostFilter.
+ * Uses a DocSetQuery wrapped with FunctionScoreQuery.boostByValue to
preserve original scores
Review Comment:
I wouldn't document in javadocs an implementation detail like this. Details
can get out of sync and the caller shouldn't care. Document for what the
caller shoud know. Like... you're actually removing values from one of the
data structures passed in (uniqueDocIds) -- that's very notable.
##########
solr/core/src/java/org/apache/solr/handler/component/combine/QueryAndResponseCombiner.java:
##########
@@ -127,4 +201,72 @@ public static QueryAndResponseCombiner getImplementation(
throw new SolrException(
SolrException.ErrorCode.BAD_REQUEST, "Unknown Combining algorithm: " +
algorithm);
}
+
+ /**
+ * A {@link DoubleValuesSource} that returns pre-computed scores for
specific document IDs. Used
+ * with {@link FunctionScoreQuery#boostByValue} to assign original combined
sub-query scores to a
+ * DocSetQuery, enabling the collapse filter to select group heads based on
these scores.
+ */
+ private static class PrecomputedScoreValuesSource extends DoubleValuesSource
{
+
+ private final IntDoubleHashMap scoreMap;
Review Comment:
its appreciated to clarify (either in naming or javadocs or comments) the
key & value of maps. Here, perhaps `global doc ID -> score` or say naming this
scoreByDoc or docToScore
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]