dsmiley commented on code in PR #4277: URL: https://github.com/apache/solr/pull/4277#discussion_r3093610784
########## changelog/unreleased/SOLR-18195-collapse-results-combined-query.yml: ########## @@ -0,0 +1,9 @@ +# See https://github.com/apache/solr/blob/main/dev-docs/changelog.adoc +title: Support for Collapse Results in Combined Query Component Review Comment: If we're searching for this, we'll likely search for "CombinedQueryComponent" as that's its name -- a class. Maybe also add (RRF) because that's the magical incantation to jog the memory of anyone reading this ```suggestion title: Support for using {!collapse} with CombinedQueryComponent (RRF) ``` ########## solr/core/src/java/org/apache/solr/handler/component/CombinedQueryComponent.java: ########## @@ -296,6 +317,10 @@ protected void mergeIds(ResponseBuilder rb, ShardRequest sreq) { long approximateTotalHits = 0; Map<String, List<ShardDoc>> shardDocMap = new HashMap<>(); String[] queriesToCombineKeys = rb.req.getParams().getParams(CombinerParams.COMBINER_QUERY); + // Build per-shard set of doc IDs retained after collapse in simpleCombine. Review Comment: That helps a lot! It may be strange to suggest this but I think simply _removing_ the comment words "retained after collapse in simpleCombine" would be slightly clearer, as that references stuff happening at a shard level which confused me a little as we're processing at the coordinator here. What I find confusing from your explanation is that a shard returns *both* `response_per_query` and the standard `response` key. I will suggest a comment to put somewhere (not precisely at the line here). Not sure if it's actually correct or useful but tell me: "This component receives *both* "response" and "response_per_query". Only the former is deduplicated, and thus most (all?) processing of the latter must exclude docs not present in the former." I wonder if we'd be better off with only "response" and then an additional key to provide info on which queries matched. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
