msokolov commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2692810655
Also, I forgot about this comment:
> Would it make sense to cap perLeafTopK by the original k? I think k is
double on every iteration, and perLeafTopK can theoretically go over the
original k, which is excessive.
I added an additional cap on this, but then realized we are already
implicitly imposing such a limit here:
```
if (perLeaf.scoreDocs.length > 0
&& perLeaf.scoreDocs[perLeaf.scoreDocs.length - 1].score >= minTopKScore
&& perLeafTopKCalculation(kInLoop / 2, ctx.reader().maxDoc() / (float)
reader.maxDoc())
<= k + 1) {
```
by the way another thing we might want to try is relaxing this reentry check
a bit by looking at the second- or third-worst per-leaf score, because in
theory `lambda` created a buffer that should cause leaves to collect deeper
than the best top K. This could enable this per-leaf strategy to outperform
the global fanout? Anyway it's easy to try
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]