javanna commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1686523892
##########
lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollectorManager.java:
##########
@@ -28,17 +31,77 @@
*/
public class TotalHitCountCollectorManager
implements CollectorManager<TotalHitCountCollector, Integer> {
+
+ /**
+ * Internal state shared across the different collectors that this collector
manager creates. It
+ * tracks leaves seen as an argument of {@link
Collector#getLeafCollector(LeafReaderContext)}
+ * calls, to ensure correctness: if the first partition of a segment early
terminates, count has
+ * been already retrieved for the entire segment hence subsequent partitions
of the same segment
+ * should also early terminate. If the first partition of a segment computes
hit counts,
+ * subsequent partitions of the same segment should do the same, to prevent
their counts from
+ * being retrieve from {@link LRUQueryCache} (which returns counts for the
entire segment)
+ */
+ private final Map<LeafReaderContext, Boolean> seenContexts = new HashMap<>();
+
@Override
public TotalHitCountCollector newCollector() throws IOException {
- return new TotalHitCountCollector();
+ return new LeafPartitionAwareTotalHitCountCollector(seenContexts);
}
@Override
public Integer reduce(Collection<TotalHitCountCollector> collectors) throws
IOException {
+ // TODO this makes the collector manager instance reusable across multiple
searches. It isn't a
+ // strict requirement
+ // but it is currently supported as collector managers normally don't hold
state, while
+ // collectors do.
+ // This currently works but would not allow to perform incremental
reductions in the future. It
+ // would be easy enough
+ // to expose an additional method to the CollectorManager interface to
explicitly clear state,
+ // which index searcher
+ // calls before starting a new search. That feels like overkill at the
moment because there is
+ // no real usecase for it.
+ // An alternative is to document the requirement to not reuse collector
managers across
+ // searches, that would be a
+ // breaking change. Perhaps not needed for now.
+ seenContexts.clear();
Review Comment:
Because the collector manager holds state, either we clean it in-between
searches (reduce is the current place where that can be done, but that may
change over time), or we declare that collector managers cannot be re-used
across multiple searches (potential breaking change).
I am unsure whether it is necessary to make a breaking change for this,
perhaps clearing state in reduce is not that bad.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]