stefanvodita commented on code in PR #13542:
URL: https://github.com/apache/lucene/pull/13542#discussion_r1665869563
##########
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##########
@@ -328,42 +336,65 @@ protected LeafSlice[] slices(List<LeafReaderContext>
leaves) {
/** Static method to segregate LeafReaderContexts amongst multiple slices */
public static LeafSlice[] slices(
List<LeafReaderContext> leaves, int maxDocsPerSlice, int
maxSegmentsPerSlice) {
+
+ // TODO this is a temporary hack to force testing against multiple leaf
reader context slices.
+ // It must be reverted before merging.
+ maxDocsPerSlice = 1;
+ maxSegmentsPerSlice = 1;
+ // end hack
+
// Make a copy so we can sort:
List<LeafReaderContext> sortedLeaves = new ArrayList<>(leaves);
// Sort by maxDoc, descending:
- Collections.sort(
- sortedLeaves, Collections.reverseOrder(Comparator.comparingInt(l ->
l.reader().maxDoc())));
+ sortedLeaves.sort(Collections.reverseOrder(Comparator.comparingInt(l ->
l.reader().maxDoc())));
- final List<List<LeafReaderContext>> groupedLeaves = new ArrayList<>();
- long docSum = 0;
- List<LeafReaderContext> group = null;
+ final List<List<LeafReaderContextPartition>> groupedLeafPartitions = new
ArrayList<>();
+ int currentSliceNumDocs = 0;
+ List<LeafReaderContextPartition> group = null;
for (LeafReaderContext ctx : sortedLeaves) {
if (ctx.reader().maxDoc() > maxDocsPerSlice) {
assert group == null;
- groupedLeaves.add(Collections.singletonList(ctx));
+ // if the segment does not fit in a single slice, we split it in
multiple partitions of
Review Comment:
Thank you for moving forward on this issue @javanna!
I had a different strategy in mind for slicing the index. With the current
implementation, we deduce the number of slices based on a given per-slice doc
count. What if the number of slices was given instead? Each segment would not
be divided into n slices, but a slice could straddle over a few segments. I
think it gives us better control over the concurrency and leaves us with fewer
slices per segment while achieving the same level of concurrency, which is
better because it reduces the fixed cost of search that we pay per slice per
segment. Are there challenges that make that hard to implement?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]