tomglk commented on a change in pull request #151:
URL: https://github.com/apache/solr/pull/151#discussion_r642512046



##########
File path: 
solr/core/src/java/org/apache/solr/handler/component/QueryComponent.java
##########
@@ -999,20 +1013,34 @@ protected void mergeIds(ResponseBuilder rb, ShardRequest 
sreq) {
 
           shardDoc.sortFieldValues = unmarshalledSortFieldValues;
 
-          queue.insertWithOverflow(shardDoc);
+          if(reRankQueue != null && docCounter++ <= reRankDocsSize) {
+              ShardDoc droppedShardDoc = 
reRankQueue.insertWithOverflow(shardDoc);
+              // FIXME: Only works if the original request does not sort by 
score

Review comment:
       I really appreciate how detailed you share your thoughts!
   This is extremely helpful to discover different ways to solve this problem.
   
   **First regarding the original score:**
   Personally, I like the idea of adding the value to the ShardDoc, but that 
would entail a PR to Lucene and we would have to wait for a newer version to 
use that in Solr to have access to that change. Like you already mentioned, 
that sounds like quite an undertaking.
   
   **Moving on to the problem with sort by score:**
   > e.g. for shards to use reRankDocs and for the federator whilst combining 
results to not use reRankDocs but to combine all shards' reranked documents 
separately in a reRankQueue that has the same sizing as queue (and then 
documents dropping out of reRankQueue need not be added to queue since them 
dropping out means there's already sufficient documents in reRankQueue to serve 
the request).
   
   I struggle to understand your idea.
   First I understood that you want to collect the returned documents before 
reranking in one queue (queue A) and the reranked documents in another queue 
(queue B). Then we could take all the documents in queue A first, remove them 
from queue B and add the results from queue B after the results from queue A.
   But the reranking happens on the shards and we are on the coordinator (in 
mergeIds), so we only have the already "mixed" (reranked and nor reranked) 
results from the shards.
   Then I thought, maybe you want the shards to return multiple result lists? 
One with the reranked documents and another one with the results before 
reranking?
   Then we could first merge the results from the reranked lists (= take the 
top ones), remove these documents from the queues that contain the results 
before reranking and then merge these results to append them after the reranked 
docs.
   
   I did not look into the code yet but that feels like quite a big task.
   
   What do you think about excluding both problems (making the original score 
visible and fixing the reranking with sort by score) to another PR? So that we 
first make LTR with a cloud setup possible at all and in a next step fix the 
problem with the sort=score?
   
   I know that the sort by score is the default, but I am also positive that 
there are many people with custom sorting that could already benefit from the 
first part (this PR without fix for score sort).
   We could also aim to finish that until 8.9 maybe?
   
   If I did not understand your last idea correctly, please let me know. And 
please be aware that I do not want to create pressure by suggesting the 
exclusion of the fix from this PR. :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to