atris commented on issue #854: Shared PQ Based Early Termination for Concurrent Search URL: https://github.com/apache/lucene-solr/pull/854#issuecomment-530380308 > I am not sure that synchronization would be an issue in this model since the update of the global min score should be quick, we can also use double checked locking or use a volatile. But it's hard to say without benchmarking as Mike said. Agreed. >I can work on one if you want to concentrate on the shared pq path ? Thanks for offering. I have a skeletal PR in flight for this approach that I plan to publish tomorrow -- maybe we can iterate on that? > If numHits means the global threshold then it's just one condition. The slice must also have a full pq to publish a min score and other slices can use it if they need to collect the same number of top docs. I am sure I am missing something here. If the user requested top N hits, then all slices can keep collecting hits in their thread local PQs and update a global counter to reflect if total hits collected globally has reached N. Once we have reached N globally, each collector can publish the value of the bottom of their thread local PQ. The minimum of all such values will be our global minimum score, since we know that, collectively, we have N hits available. Post that, all collectors will use the global minimum score to filter hits. If, a collector finds a competitive hit, it adds it to the local queue, updates its local minimum score and triggers a resync, where the minimum of all minimum scores (if that makes sense) is taken and kept as the global worst hit. Maintaining the per collector worst score and calculating the minimum of them all can be done by storing those values in a global state. We could store them in an array, but then, each new competitive hit will incur an O(n) penalty for calculating the worst score. Another option is storing them in a heap, but it would require a Fibonacci heap to be able to change the value of a key without replacing the node.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
