gsmiller commented on PR #15938: URL: https://github.com/apache/lucene/pull/15938#issuecomment-4206699682
> Given we know the number of matching documents and leaves upfront, I am wondering if it makes sense to keep existing logic for sparse cases? Great question! So I did play with this a bit in benchmarking. I forked the logic based on whether-or-not there were more leaves than docs (essentially handling the outlier cases where there are lots of leaves and very few docs with the current linear scan). I ultimately shelved that idea for two reasons: (1) I questioned the trade-off of having two separate implementations to handle outlier cases that I don't think are super likely, and (2) the tuning heuristic of when to use linear isn't quite as clean as checking which is larger, docs or leaves (it seemed to be hardware dependent). Also, for the most part, the cases where linear scan is more performant are already very fast anyway (because you need to have very few docs), so I wasn't sure it was worth further optimizing those cases with a forked implementation when the practical difference would likely be small anyway. One thing I like about the binary search approach is that it makes the "slow cases" (i.e., lots of docs) much faster. But stepping back, I'm not opposed to having two implementations if there's support for it. We do that sort of thing in other places. I was trying to start simple though and avoid feedback in the other direction (e.g., "why are you overcomplicating this with two implementations?" :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
