msokolov commented on code in PR #12380:
URL: https://github.com/apache/lucene/pull/12380#discussion_r1245115736
##########
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/TopSuggestDocsCollector.java:
##########
@@ -100,12 +100,19 @@ public int getCountToCollect() {
@Override
protected void doSetNextReader(LeafReaderContext context) throws IOException
{
docBase = context.docBase;
+ }
+
+ @Override
+ public void finish() throws IOException {
if (seenSurfaceForms != null) {
- seenSurfaceForms.clear();
// NOTE: this also clears the priorityQueue:
for (SuggestScoreDoc hit : priorityQueue.getResults()) {
pendingResults.add(hit);
}
+
+ // Deduplicate all hits: we already dedup'd efficiently within each
segment by
Review Comment:
any particular reason to change the order of operations here?
##########
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/TopSuggestDocsCollector.java:
##########
@@ -136,15 +143,7 @@ public TopSuggestDocs get() throws IOException {
SuggestScoreDoc[] suggestScoreDocs;
if (seenSurfaceForms != null) {
- // NOTE: this also clears the priorityQueue:
- for (SuggestScoreDoc hit : priorityQueue.getResults()) {
- pendingResults.add(hit);
- }
-
- // Deduplicate all hits: we already dedup'd efficiently within each
segment by
- // truncating the FST top paths search, but across segments there may
still be dups:
- seenSurfaceForms.clear();
Review Comment:
oh I see we did it both ways. Probably makes no difference? Still
superstitiously I would always want to clear/delete things last just in case...
##########
lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingLeafCollector.java:
##########
@@ -57,4 +58,11 @@ public void collect(int doc) throws IOException {
public DocIdSetIterator competitiveIterator() throws IOException {
return in.competitiveIterator();
}
+
+ @Override
+ public void finish() throws IOException {
+ assert finishCalled == false;
Review Comment:
Did we previously disallow re-use of LeafCollectors? If not, this could
break someone
##########
lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingCollector.java:
##########
@@ -49,7 +50,9 @@ public LeafCollector getLeafCollector(LeafReaderContext
context) throws IOExcept
assert context.docBase >= previousLeafMaxDoc;
previousLeafMaxDoc = context.docBase + context.reader().maxDoc();
+ assert hasFinishedCollectingPreviousLeaf;
Review Comment:
it's a pity we can't assert that we finished the final leaf
##########
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##########
@@ -749,6 +749,9 @@ protected void search(List<LeafReaderContext> leaves,
Weight weight, Collector c
partialResult = true;
}
}
+ // Note: this is called if collection ran successfully, including the
above special cases of
+ // CollectionTerminatedException and TimeExceededException, but no other
exception.
+ leafCollector.finish();
Review Comment:
This could be an opportunity for capturing statistics about how often
time-limitation is applied?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]