msokolov commented on a change in pull request #1235: LUCENE-8929: parallel 
early termination sharing counts and max scores across leaves
URL: https://github.com/apache/lucene-solr/pull/1235#discussion_r374954900
 
 

 ##########
 File path: 
lucene/core/src/java/org/apache/lucene/search/MaxScoreTerminator.java
 ##########
 @@ -0,0 +1,170 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * MaxScoreTerminator is notified periodically by leaf collectors calling 
{@link #update}
+ * with their worst (ie maximum) score
+ * and how many hits they have collected.  When enough hits are collected, 
Scoreboard notifies
+ * noncompetitive leaf collectors when they can stop (early terminate) by 
returning true from
+ * its {@link #update} method.
+ * 
+ * <p>At any moment, N leaves have reported their counts of documents 
collected; documents are
+ * collected in score order, so these counts represent the best for each leaf. 
And we also track
+ * the scores of the lowest-scoring (most recently collected) document in each 
leaf.</p>
+ *
+ * <p>Once the total number of documents collected reaches the requested total 
(numHits),
+ * the worst-scoring leaf can no longer contribute any documents to the 
results, so it can be
+ * terminated, and any leaves whose scores rise above that worst score is no 
longer competitive and
+ * can also be terminated. If we kept a global priority queue we could update 
the global maximum
+ * competitive score, and use that as a termination threshold, but assuming 
this to be too costly
+ * due to thread contention, we seek to more cheaply update an upper bound on 
the worst score.
+ * Specifically, when a leaf is terminated, if the remaining leaves together 
have collected >= numHits,
+ * then we can update the maximum to the max of *their* max scores, excluding 
the terminated leaf's max
+ * from consideration.</p>
+ *
+ * <p> In practice this leads to a good bound on the number of documents 
collected, which tend to
+ * exceed numHits by a small factor.  When the documents are evenly 
distributed among N segments,
+ * we expect to collect approximately (N+1/N) * numHits documents. In a worst 
case, where *all*
+ * the best documents are in a single segment, we expect to collect something 
O(log N) ie
+ * (1/N + 1/N-1 + ... + 1) * numHits documents, which is still much better than
+ * the N * numHits we would collect with a naive strategy.
+ * </p>
+ */
+class MaxScoreTerminator {
+
+  // we use 2^5-1 to check the remainder with a bitwise operation
 
 Review comment:
   Yes, that's right. I'm changing to 5 (leaf collectors should call for every 
32d collected hit), but this nominal value is never used in practice.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to