cpoerschke commented on a change in pull request #300: SOLR-11831: Skip second 
grouping step if group.limit is 1 (aka Las Vegas Patch)
URL: https://github.com/apache/lucene-solr/pull/300#discussion_r267518970
 
 

 ##########
 File path: 
solr/core/src/java/org/apache/solr/search/grouping/distributed/responseprocessor/SearchGroupShardResponseProcessor.java
 ##########
 @@ -143,12 +151,61 @@ public void process(ResponseBuilder rb, ShardRequest 
shardRequest) {
       if (mergedTopGroups == null) {
         continue;
       }
-
-      rb.mergedSearchGroups.put(groupField, mergedTopGroups);
-      for (SearchGroup<BytesRef> mergedTopGroup : mergedTopGroups) {
-        rb.searchGroupToShards.get(groupField).put(mergedTopGroup, 
tempSearchGroupToShards.get(groupField).get(mergedTopGroup));
+      if (rb.getGroupingSpec().isSkipSecondGroupingStep()){
+          /* If we are skipping the second grouping step we want to translate 
the response of the
+           * first step in the response of the second step and send it to the 
get_fields step.
+           */
+          processSkipSecondGroupingStep(rb, mergedTopGroups, 
tempSearchGroupToShards, docIdToShard, groupSort, fields,  groupField);
 
 Review comment:
   Two things and one question 'jumped out' at me when looking at this specific 
proposed change here:
   (1) The new local `docIdToShard` variable is allocated and accumulated on 
both if-and-else code paths but only used in the if-case.
   (2) The `processSkipSecondGroupingStep` takes seven arguments which 
(subjectively) seems a lot.
   (3) question: There's no `rb.mergedSearchGroups.put` and 
`rb.searchGroupToShards` in the if-case -- why might that be and would that be 
okay?
   
   Taking a closer look I learnt the following:
   (4) In the `processSkipSecondGroupingStep` method there is an 
`rb.searchGroupToShards.put` call.
   (5) Two of the `processSkipSecondGroupingStep` arguments are contained in 
the `rb` argument, namely `groupSort` and `fields`.
   (6) One of the `processSkipSecondGroupingStep` arguments i.e. 
`tempSearchGroupToShards` is used only for the `rb.searchGroupToShards.put` 
call. If that call could be factored out of the method then that would reduce 
the if-vs-else-case differences and save one method argument.
   (7) The new local `docIdToShard` variable has conceptual similarity to the 
existing `tempSearchGroupToShards` variable i.e both map 'something' to a shard 
or shards:
   (7a) `docIdToShard` is a one-to-one mapping since a single document lives on 
exactly one shard
   (7b) `tempSearchGroupToShards` is a one-to-mapping mapping since a single 
group can contain multiple documents that (between them) live on multiple shards
   
   Next then I did some exploratory refactoring to see if/how:
   (8) unnecessary allocation and accumulation of `docIdToShard` might be 
avoided for (existing) code paths that do not or cannot use the 
`group.skip.second.step=true` optimisation
   (9) the `processSkipSecondGroupingStep` signature might be narrowed to take 
fewer than seven arguments
   
   What emerged was a deceptively simple idea:
   (10) replace the `Map<String, Map<SearchGroup<BytesRef>, Set<String>>> 
tempSearchGroupToShards;` local variable with a `SearchGroupsContainer 
searchGroupsContainer;` local variable and give the `SearchGroupsContainer` 
inner class methods that do what the code currently does i.e. just code 
refactoring
   (11) for the `group.skip.second.step=true` optimisation extend the 
`SearchGroupsContainer` class and override methods as required
   
   
https://github.com/cpoerschke/lucene-solr/tree/github-bloomberg-SOLR-11831-cpoerschke-4
 shares what emerged, it compiles but is totally untested:
   * 
https://github.com/cpoerschke/lucene-solr/commit/6f53499edaa9df5dc1a1ace371872404df75ee84
 shows the differences relative to the branch behind the pull request here
   * 
https://github.com/apache/lucene-solr/compare/5c143022e7abcdf14a570786afec4ff099fd581c...6f53499edaa9df5dc1a1ace371872404df75ee84
 shows the ('all in') differences relative to the master branch baseline code
   
   What do you think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to