mikemccand commented on issue #823: LUCENE-8939: Introduce Shared Count Early Termination In Parallel Search URL: https://github.com/apache/lucene-solr/pull/823#issuecomment-527203682 > > I'm trying to understand the behavior change Lucene users will see with this, when using concurrent searching for one query (passing `ExecutorService` to `IndexSearcher`): > > It looks like with the change such users will see their search precisely when the total collected hits exceeds the limit (1000 by default?), versus today where we will try to collect 1000 per segment and then reduce that to the top 1000 overall? So this means the results will change depending on thread execution/timing? > > Looking at the documentation around `TOTAL_HITS_THRESHOLD`, I see that it intends to restrict the number of documents scored in total before the query is early terminated. If we do a single threaded search today, that is the behavior we get. However, for concurrent search, we actually look at N * `TOTAL_HITS_THRESHOLD`, where N is the number of slices. So, I believe that we are not doing the advertised behavior for concurrent searches in the status quo. This change should fix that. > > However, you are correct that thread timing will come into play here -- different slices may have different contributions to the overall number of hits. However, since we are anyways not scoring all documents, I do not believe we offer any guarantees on the documents that we return -- even today, the best documents might be the ones which just came in and hence are on the last segments to be traversed, so never even get looked. WDYT? OK that makes sense @atris -- it seems that which specific top hits you'll get back is intentionally not defined in the API and so we have the freedom to make improvements like this.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org