atris commented on issue #854: Shared PQ Based Early Termination for Concurrent 
Search
URL: https://github.com/apache/lucene-solr/pull/854#issuecomment-530380308
 
 
   > I am not sure that synchronization would be an issue in this model since 
the update of the global min score should be quick, we can also use double 
checked locking or use a volatile. But it's hard to say without benchmarking as 
Mike said.
   
   Agreed.
   
   >I can work on one if you want to concentrate on the shared pq path ?
   
   Thanks for offering. I have a skeletal PR in flight for this approach that I 
plan to publish tomorrow -- maybe we can iterate on that?
   
   
   > If numHits means the global threshold then it's just one condition. The 
slice must also have a full pq to publish a min score and other slices can use 
it if they need to collect the same number of top docs.
   
   I am sure I am missing something here. If the user requested top N hits, 
then all slices can keep collecting hits in their thread local PQs and update a 
global counter to reflect if total hits collected globally has reached N. Once 
we have reached N globally, each collector can publish the value of the bottom 
of their thread local PQ. The minimum of all such values will be our global 
minimum score, since we know that, collectively, we have N hits available. Post 
that, all collectors will use the global minimum score to filter hits. If, a 
collector finds a competitive hit, it adds it to the local queue, updates its 
local minimum score and triggers a resync, where the minimum of all minimum 
scores (if that makes sense) is taken and kept as the global worst hit.
   
   Maintaining the per collector worst score and calculating the minimum of 
them all can be done by storing those values in a global state. We could store 
them in an array, but then, each new competitive hit will incur an O(n) penalty 
for calculating the worst score. Another option is storing them in a heap, but 
it would require a Fibonacci heap to be able to change the value of a key 
without replacing the node.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to