cpoerschke commented on a change in pull request #300: SOLR-11831: Skip second grouping step if group.limit is 1 (aka Las Vegas Patch) URL: https://github.com/apache/lucene-solr/pull/300#discussion_r267518970
########## File path: solr/core/src/java/org/apache/solr/search/grouping/distributed/responseprocessor/SearchGroupShardResponseProcessor.java ########## @@ -143,12 +151,61 @@ public void process(ResponseBuilder rb, ShardRequest shardRequest) { if (mergedTopGroups == null) { continue; } - - rb.mergedSearchGroups.put(groupField, mergedTopGroups); - for (SearchGroup<BytesRef> mergedTopGroup : mergedTopGroups) { - rb.searchGroupToShards.get(groupField).put(mergedTopGroup, tempSearchGroupToShards.get(groupField).get(mergedTopGroup)); + if (rb.getGroupingSpec().isSkipSecondGroupingStep()){ + /* If we are skipping the second grouping step we want to translate the response of the + * first step in the response of the second step and send it to the get_fields step. + */ + processSkipSecondGroupingStep(rb, mergedTopGroups, tempSearchGroupToShards, docIdToShard, groupSort, fields, groupField); Review comment: Two things and one question 'jumped out' at me when looking at this specific proposed change here: (1) The new local `docIdToShard` variable is allocated and accumulated on both if-and-else code paths but only used in the if-case. (2) The `processSkipSecondGroupingStep` takes seven arguments which (subjectively) seems a lot. (3) question: There's no `rb.mergedSearchGroups.put` and `rb.searchGroupToShards` in the if-case -- why might that be and would that be okay? Taking a closer look I learnt the following: (4) In the `processSkipSecondGroupingStep` method there is an `rb.searchGroupToShards.put` call. (5) Two of the `processSkipSecondGroupingStep` arguments are contained in the `rb` argument, namely `groupSort` and `fields`. (6) One of the `processSkipSecondGroupingStep` arguments i.e. `tempSearchGroupToShards` is used only for the `rb.searchGroupToShards.put` call. If that call could be factored out of the method then that would reduce the if-vs-else-case differences and save one method argument. (7) The new local `docIdToShard` variable has conceptual similarity to the existing `tempSearchGroupToShards` variable i.e both map 'something' to a shard or shards: (7a) `docIdToShard` is a one-to-one mapping since a single document lives on exactly one shard (7b) `tempSearchGroupToShards` is a one-to-mapping mapping since a single group can contain multiple documents that (between them) live on multiple shards Next then I did some exploratory refactoring to see if/how: (8) unnecessary allocation and accumulation of `docIdToShard` might be avoided for (existing) code paths that do not or cannot use the `group.skip.second.step=true` optimisation (9) the `processSkipSecondGroupingStep` signature might be narrowed to take fewer than seven arguments What emerged was a deceptively simple idea: (10) replace the `Map<String, Map<SearchGroup<BytesRef>, Set<String>>> tempSearchGroupToShards;` local variable with a `SearchGroupsContainer searchGroupsContainer;` local variable and give the `SearchGroupsContainer` inner class methods that do what the code currently does i.e. just code refactoring (11) for the `group.skip.second.step=true` optimisation extend the `SearchGroupsContainer` class and override methods as required https://github.com/cpoerschke/lucene-solr/tree/github-bloomberg-SOLR-11831-cpoerschke-4 shares what emerged, it compiles but is totally untested: * https://github.com/cpoerschke/lucene-solr/commit/6f53499edaa9df5dc1a1ace371872404df75ee84 shows the differences relative to the branch behind the pull request here * https://github.com/apache/lucene-solr/compare/5c143022e7abcdf14a570786afec4ff099fd581c...6f53499edaa9df5dc1a1ace371872404df75ee84 shows the ('all in') differences relative to the master branch baseline code What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org