Experience with a round robin or dynamic collection of docs from segments within a Slice

Gautam Worah Fri, 07 Jun 2024 12:45:03 -0700

Hi folks,

I was wondering if people had experimented with something other than the
default Lucene search logic of completing one segment within a Slice at a
time, and then going on to the next segment in sequential order.


Here is the current logic in IndexSearcher#search(List<LeafReaderContext>
leaves, Weight weight, Collector collector):

```
for (LeafReaderContext ctx : leaves) {
...
scorer.score(leafCollector, ctx.reader().getLiveDocs());
...
}
```

For sorted indexes (e.g. time sorted data), for a query that asks for the
top-k results, it may be better to round robin among leaves (or expand more
into the leaf that has better values) within a Slice that uses just a
single thread?
In Amazon's Product Search context, our index is sorted in descending order by
a custom Score.

I wanted to experiment with something like:

```
while (isAnyCollectorNotTerminated) { // do a round robin
for (LeafReaderContext ctx : leaves) {
...
scorer.score(leafCollector, ctx.reader().getLiveDocs(), start, end);
...
}
}
```

However, I came across two issues until now:
1. DrillSidewaysScorer has a docId limitation of start=0 and end=Integer
.MAX_VALUE
2. FacetsCollector is designed in a way that it needs a leaf to be finished
before it can go on to
the next leaf. This does not work with my logic of round-robining.

I am working through them right now.
I was wondering if people had in general tried this approach, and whether
they knew of other
problems that might arise, potential re-routes, performance numbers or any
other experiences in general?
Are there any other Collectors that have such design decisions?

I see the current code has a comment like:

```
// TODO: should we make this
// threaded...? the Collector could be sync'd?
```

so I guess there were some ideas around making this logic smarter?

Thanks for the help!

-
Gautam Worah.

Experience with a round robin or dynamic collection of docs from segments within a Slice

Reply via email to