krickert commented on PR #15676: URL: https://github.com/apache/lucene/pull/15676#issuecomment-3930001585
You’re right, I mixed objectives. I’ll focus on recall next, specifically recall vs `efSearch` across three scenarios: - single-shard baseline - multi-shard independent - multi-shard collaborative on the same shard graphs I’ll treat work/latency as secondary and keep them out of the main conclusion for now. Next I’ll test whether recall can be improved by adding shard-aware index-time context instead of relying on search-time coordination alone. I’ll prototype a lightweight global routing layer and cross-shard neighborhood metadata so shard traversal starts with better global priors. I think the core issue is that each shard currently builds and searches its own local ANN neighborhood frontier. A single shard can look strong, but once we merge across many shard-local frontiers, recall drops much harder than I expected. It’s honestly more severe than I thought, and that’s exactly why I think index-time global awareness can help. I'm looking through some papers for a round, but I'll test out a few more scenarios. Thanks for being patient, by the way.. I really want to push hard for getting a high K search to be the norm. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
