tomglk commented on a change in pull request #151:
URL: https://github.com/apache/solr/pull/151#discussion_r645777966



##########
File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
##########
@@ -999,20 +1013,34 @@ protected void mergeIds(ResponseBuilder rb, ShardRequest 
sreq) {
 
           shardDoc.sortFieldValues = unmarshalledSortFieldValues;
 
-          queue.insertWithOverflow(shardDoc);
+          if(reRankQueue != null && docCounter++ <= reRankDocsSize) {
+              ShardDoc droppedShardDoc = 
reRankQueue.insertWithOverflow(shardDoc);
+              // FIXME: Only works if the original request does not sort by 
score

Review comment:
       Hi @cpoerschke please excuse the delay, I am very sorry!
   I just found time to look at your commit.
   The tests fail because they assume, that the number of reRankDocs is applied 
to the whole result. Your change applies it to each shard.
   So with e.g. 3 shards and reRankDocs=2 we expect the first 2 docs to be 
reranked, but instead 6 docs get reranked. Then the assertion on sort by id 
fails.
   (Actually, the 6 is variable. I ran tests where 5 docs were reRanked in 
total, because shard 3 only had one doc.)
   
   Because the client cannot (and should not have to) know, how many shards the 
cluster has, we would have to convert the reRankDocs that were passed in the 
`{!ltr...}`-query to a per shard value. 
   So basically `<value that was passed in> / numShards`.
   
   However, I don't think that we can convert the global setting for reRankDocs 
into a per shard setting because we cannot know how the top reRankDocs are 
distributed across the shards. 
   So when the client wants 6 docs to be reRanked and we have 3 shards, we 
cannot tell each shard that it should reReank the first 2 documents.
   This is because the top 6 reRanked docs may be all be on one shard.
   That was our reasoning behind the `droppedShardDoc`-logic.
   
   So referring to your examples (thank you for the visualization idea! :) ):
   ```
   dual sharded collection with reRankDocs=6 (3 per shard)
   
   shard1: a c e   g i y # without reranking
   shard2: b d f h j x z # without reranking
   
   shard1: E C A   g i y
   shard2: F D B   h j x z
   
   reRankedQueue: E C A F D B  # assuming that all scores are higher on shard 1
   queue: g i y h j x z
   
   overall results: E C A F D B g i y h j x z
   ```
   But the top 6 docs by their reRanked score could be:
   ```
   dual sharded collection with reRankDocs=6 (also applied on shard level)
   
   shard1: a c e   g i y # without reranking
   shard2: b d f h j x z # without reranking
   
   shard1: Y I G E C A
   shard2: X J H F D B z
   
   reRankedQueue: Y I G E C A   # assuming that all scores are higher on shard 1
   queue: b d f h j x z
   
   overall results: Y I G E C A b d f h j x z
   ```
   
   That's a bit sad, because if we could just convert the global param for the 
reRankDocs to a per shard one, we would not have the problem with the missing 
originalScore for documents, that were reRanked on a shard but not in the whole 
result. :/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to