atris commented on issue #854: Shared PQ Based Early Termination for Concurrent 
Search
URL: https://github.com/apache/lucene-solr/pull/854#issuecomment-530453877
 
 
   
   > We need to ensure that the minimum score of the slice can be set as the 
maximum minimum score for all slices, however if the slice is not full this 
means that some document with smaller scores can still enter the top N no 
matter what the top hits are in the other slices. So if the total hit threshold 
is 10 for instance and you ask for 2 top documents per slice, you cannot set 
the minimum score if only 1 document is inserted in each slice otherwise you 
can miss documents.
   
   To clarify my understanding, consider the following case:
   
   Top 10 hits have been requested, with totalHitsThreshold being 25. Number of 
collectors is 3.
   
   Assume that we have reached 10 hits globally, with a random distribution of 
hits between collectors. Consider collector 1 has 3, Collector 2 has 6 and 
Collector 3 has 1 hit. Whenever a collector sees that numHits have been 
collected globally (for the first time), then it broadcasts its bottom PQ score 
to all collectors, and the collectors can update their bottom score if needed.
   Consider the global bottom value to be X.
   
   Now, for each collector, a document will be added to the local PQ only if 
the document has a score higher than X. If, theoretically, no document in any 
thread exceeds X, then we will cumulatively still have numHits. If there is a 
competitive hit, then we can add it to the thread local PQ and broadcast the 
minimum score.
   
   However, if a priority queue is full, then for a further hit to be inserted 
in the queue, it needs to be better than that local bottom PQ's score i.e. just 
being better than global bottom will not suffice, but it needs to be better 
than the local bottom.
   
   Makes sense?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to