gus-asf commented on code in PR #2960:
URL: https://github.com/apache/solr/pull/2960#discussion_r1935871739
##########
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##########
@@ -29,11 +30,15 @@
*/
public class EarlyTerminatingCollector extends FilterCollector {
+ private final int chunkSize = 100; // Check across threads only at a chunk
size
Review Comment:
How was this chosen? Should it be tuneable via env/sysprop?
##########
solr/solr-ref-guide/modules/query-guide/pages/common-query-parameters.adoc:
##########
@@ -399,6 +399,23 @@ For example, setting `cpuAllowed=500` gives a limit of at
most 500 ms of CPU tim
All other considerations regarding partial results listed for the
`timeAllowed` parameter apply here, too.
+
+== maxHits Parameter
+
+[%autowidth,frame=none]
+|===
+|Optional |Default: `false`
+|===
+
+This parameter specifies the max number of hits a searcher will iterate
through before early terminating the search.
+The count is per shard and across all threads involved in case of
multi-threaded search. This parameter works
+in conjunction with other parameters that could early terminate a search, ex:
_timeAllowed_ etc. In case the search
+was early terminated due to it exceeding maxHits a _terminatedEarly_ header in
the response will be set along with
+_partialResults_ to indicate the same. Note that the _partialResults_ flag
could be set in the absence of the _maxHits_
+parameter due to other limits like _timeAllowed_ or _cpuAllowed_.
+Note : the hits counted will need not be exactly equal to the maxHits
provided, however it will be within range of this value.
+
+
Review Comment:
I have a similar concern as Houston regarding helping the user understand
when to use this, here's my attempt to rewrite your text based on what I've
gathered... of course if I misunderstood something or I'm too wordy we should
fix that ;) As a side note I think we're supposed to write these as "ventilated
prose" which is one sentence per line, no line breaks in a sentence.
```
== maxHits Parameter
[%autowidth,frame=none]
|===
|Optional |Default: `false`
|===
This parameter specifies the max number of hits a searcher will iterate
through.
Searchers will arbitrarily ignore any number of additional hits beyond this
value.
Practically speaking, the count is per shard and across all threads involved
in case of multi-threaded search.
The intention of this feature is to favor speed over perfect relevancy &
recall.
The trade-off is that if one shard contains many relevant hits and another
contains a few less relevant hits the less relevant hits from the second shard
may get returned instead of the more relevant hits that were clustered in the
first shard.
This parameter works in conjunction with other parameters that could early
terminate a search, ex: _timeAllowed_ etc.
In case the search was early terminated due to it exceeding maxHits a
_terminatedEarly_ header in the response will be set along with
_partialResults_ to indicate the same.
The _partialResults_ flag could be set in the absence of the _maxHits_
parameter due to other limits like _timeAllowed_ or _cpuAllowed_.
```
I have to admit I don't understand this bit:
```
NOTE: the hits counted will need not be exactly equal to the maxHits
provided, however it will be within range of this value.
```
##########
solr/core/src/java/org/apache/solr/search/QueryCommand.java:
##########
@@ -194,7 +195,8 @@ public QueryCommand setNeedDocSet(boolean needDocSet) {
}
public boolean getTerminateEarly() {
- return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0;
+ return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0
Review Comment:
As I noted in another comment I think this and the Lucene feature are
significantly different. I don't like conflating them.
##########
solr/core/src/java/org/apache/solr/search/MultiThreadedSearcher.java:
##########
@@ -190,10 +194,16 @@ public ScoreMode scoreMode() {
static class SearchResult {
final ScoreMode scoreMode;
private final Object[] result;
+ final boolean isTerminatedEarly;
public SearchResult(ScoreMode scoreMode, Object[] result) {
+ this(scoreMode, result, false);
Review Comment:
My IDE claims this unused. Since this is a package private inner class we
probably don't need to maintain this constructor for the API.
##########
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##########
@@ -61,11 +69,25 @@ public LeafCollector getLeafCollector(LeafReaderContext
context) throws IOExcept
public void collect(int doc) throws IOException {
super.collect(doc);
numCollected++;
- if (maxDocsToCollect <= numCollected) {
+ terminatedEarly = maxDocsToCollect <= numCollected;
+ if (numCollected % chunkSize == 0) {
+ pendingDocsToCollect.add(chunkSize);
+ final long overallCollectedDocCount =
pendingDocsToCollect.intValue();
+ terminatedEarly = overallCollectedDocCount >= maxDocsToCollect;
Review Comment:
I think the code would be more legible if you just threw in two cases rather
than tracking it with a boolean that is overwritten. This initially looked like
a logic error. I expected to an `|=` until I stopped and thought about it for a
while to convince myself that the second case can't be false after the first is
true. Seems better to me if future maintainers don't have to sort that out for
themselves.
##########
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##########
@@ -43,11 +48,14 @@ public class EarlyTerminatingCollector extends
FilterCollector {
* @param maxDocsToCollect - the maximum number of documents to Collect
*/
public EarlyTerminatingCollector(Collector delegate, int maxDocsToCollect) {
- super(delegate);
- assert 0 < maxDocsToCollect;
Review Comment:
Why are the asserts removed?
##########
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##########
@@ -61,11 +69,25 @@ public LeafCollector getLeafCollector(LeafReaderContext
context) throws IOExcept
public void collect(int doc) throws IOException {
super.collect(doc);
numCollected++;
- if (maxDocsToCollect <= numCollected) {
+ terminatedEarly = maxDocsToCollect <= numCollected;
+ if (numCollected % chunkSize == 0) {
+ pendingDocsToCollect.add(chunkSize);
+ final long overallCollectedDocCount =
pendingDocsToCollect.intValue();
+ terminatedEarly = overallCollectedDocCount >= maxDocsToCollect;
+ }
+ if (terminatedEarly) {
throw new EarlyTerminatingCollectorException(
- numCollected, prevReaderCumulativeSize + (doc + 1));
+ maxDocsToCollect, prevReaderCumulativeSize + (doc + 1));
}
}
};
}
+
+ public boolean isTerminatedEarly() {
Review Comment:
My IDE says this method is unused... do we use it somewhere I've missed?
##########
solr/core/src/java/org/apache/solr/search/EarlyTerminatingCollector.java:
##########
@@ -29,11 +30,15 @@
*/
public class EarlyTerminatingCollector extends FilterCollector {
+ private final int chunkSize = 100; // Check across threads only at a chunk
size
+
private final int maxDocsToCollect;
private int numCollected = 0;
private int prevReaderCumulativeSize = 0;
private int currentReaderSize = 0;
+ private final LongAdder pendingDocsToCollect;
Review Comment:
LongAdder is a neat class I wasn't aware of, but my read of the javadocs for
LongAdder is that it's for rarely read values like statistics/metrics (frequent
update, infrequent read)... the way you are using it you read from it
immediately after every add. Maybe just AtomicLong? Happy to hear arguments to
the contrary, but everyone knows what an AtomicLong is so slightly simpler to
have that (and perhaps slightly less memory usage according to the javadocs).
##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -295,7 +295,7 @@ private Collector buildAndRunCollectorChain(
final boolean terminateEarly = cmd.getTerminateEarly();
if (terminateEarly) {
- collector = new EarlyTerminatingCollector(collector, cmd.getLen());
+ collector = new EarlyTerminatingCollector(collector,
cmd.getMaxHitsTerminateEarly());
Review Comment:
How is it we are replacing getLen() here... how does this not break
SOLR-3240
(https://github.com/apache/solr/commit/7141cb35a761eea1f0e6f3ad6abf84a4c972fedc#diff-99978700f1c69d6a5f6c2190f89c98cfe20c441161db5a183ec002e15cb1be28)
##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -329,12 +329,12 @@ private Collector buildAndRunCollectorChain(
if (collector instanceof DelegatingCollector) {
((DelegatingCollector) collector).complete();
}
- throw etce;
+ qr.setPartialResults(true);
Review Comment:
Does not throwing here effect SOLR-3240, (the original reason for ETCE) ?
##########
solr/core/src/java/org/apache/solr/search/QueryCommand.java:
##########
@@ -194,7 +195,8 @@ public QueryCommand setNeedDocSet(boolean needDocSet) {
}
public boolean getTerminateEarly() {
- return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0;
+ return (flags & SolrIndexSearcher.TERMINATE_EARLY) != 0
Review Comment:
And after looking deeper, there's something funny going on in flags. There
are two flags:
```
static final int SEGMENT_TERMINATE_EARLY = 0x08;
public static final int TERMINATE_EARLY = 0x04;
```
yet I see this code (ide says it's unused) in QueryCommand:
```
public QueryCommand setTerminateEarly(boolean segmentTerminateEarly) {
if (segmentTerminateEarly) {
return setFlags(SolrIndexSearcher.TERMINATE_EARLY);
} else {
return clearFlags(SolrIndexSearcher.TERMINATE_EARLY);
}
}
```
as well as:
```
public QueryCommand setSegmentTerminateEarly(boolean
segmentSegmentTerminateEarly) {
if (segmentSegmentTerminateEarly) {
return setFlags(SolrIndexSearcher.SEGMENT_TERMINATE_EARLY);
} else {
return clearFlags(SolrIndexSearcher.SEGMENT_TERMINATE_EARLY);
}
}
```
So there's something we should maybe understand and maybe clean up here...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]