benwtrent commented on code in PR #14191:
URL: https://github.com/apache/lucene/pull/14191#discussion_r1939664144
##########
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##########
@@ -19,11 +19,7 @@
import static org.apache.lucene.search.DocIdSetIterator.NO_MORE_DOCS;
import java.io.IOException;
-import java.util.ArrayList;
-import java.util.Arrays;
-import java.util.Comparator;
-import java.util.List;
-import java.util.Objects;
+import java.util.*;
Review Comment:
lets not use `*`
##########
lucene/core/src/java/org/apache/lucene/search/knn/TopKnnCollectorManager.java:
##########
@@ -55,8 +57,13 @@ public KnnCollector newCollector(
if (globalScoreQueue == null) {
return new TopKnnCollector(k, visitedLimit, searchStrategy);
} else {
- return new MultiLeafKnnCollector(
- k, globalScoreQueue, new TopKnnCollector(k, visitedLimit,
searchStrategy));
+ if (freeze.getAndSet(false)) {
+ return new MultiLeafKnnCollector(
+ k, globalScoreQueue, new TopKnnCollector(k, visitedLimit,
searchStrategy), false);
+ } else {
Review Comment:
I don't think you need to do this "freeze" thing. The only issue was sharing
information between threads. A single executor continuing to share information
is completely ok.
##########
lucene/core/src/java/org/apache/lucene/search/knn/TopKnnCollectorManager.java:
##########
@@ -55,8 +57,13 @@ public KnnCollector newCollector(
if (globalScoreQueue == null) {
return new TopKnnCollector(k, visitedLimit, searchStrategy);
} else {
- return new MultiLeafKnnCollector(
- k, globalScoreQueue, new TopKnnCollector(k, visitedLimit,
searchStrategy));
+ if (freeze.getAndSet(false)) {
+ return new MultiLeafKnnCollector(
+ k, globalScoreQueue, new TopKnnCollector(k, visitedLimit,
searchStrategy), false);
+ } else {
Review Comment:
For this change, we simply make `MultiLeafKnnCollector` not thread safe at
all. We can continue to have this `globalScoreQueue` that is unique per thread
and shared between all the collectors, but all the code assumes its within a
single thread.
This will likely speed things up significantly.
##########
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##########
@@ -88,11 +91,64 @@ public Query rewrite(IndexSearcher indexSearcher) throws
IOException {
getKnnCollectorManager(k, indexSearcher),
indexSearcher.getTimeout());
TaskExecutor taskExecutor = indexSearcher.getTaskExecutor();
List<LeafReaderContext> leafReaderContexts = reader.leaves();
+
List<Callable<TopDocs>> tasks = new ArrayList<>(leafReaderContexts.size());
- for (LeafReaderContext context : leafReaderContexts) {
- tasks.add(() -> searchLeaf(context, filterWeight, knnCollectorManager));
+
+ TopDocs[] perLeafResults;
+ if (leafReaderContexts.size() > 1) {
+ if (true) {
+ /* sort LRCs by segment size */
+ List<LeafReaderContext> sortedLeafReaderContexts =
leafReaderContexts.stream()
+ .sorted(Comparator.comparingInt(o ->
o.reader().numDocs())).toList();
+ int noLRCs = sortedLeafReaderContexts.size();
Review Comment:
I think this is OK for a POC, but we can easily sort actual number of
vectors for real data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]