krickert commented on PR #15676: URL: https://github.com/apache/lucene/pull/15676#issuecomment-3933936234
On this low-latency, 16-shard setup, collaborative and independent sit on top of each other: recall is the same and total lookups are the same, so the plot shows negligible impact of collaborative search here. The takeaway is that on a fast machine with modest index size, we don’t see a recall or work tradeoff yet. <img width="1627" height="884" alt="image" src="https://github.com/user-attachments/assets/e68c9afe-44d1-4b84-9ceb-3f2c9c2512a3" /> This chart shows that pruning does occur (lookups_saved > 0) and that the gains are larger when shards do more work (e.g. higher ef or larger index). So the mechanism works; the small effect in Chart 1 is because total work per query is already small in this environment. <img width="2077" height="994" alt="image" src="https://github.com/user-attachments/assets/eb38907f-4365-4a9a-9ed7-d46d32bcfb74" /> The Pareto plot shows the same story: collaborative and independent trace the same recall–latency and recall-work curves on this setup, so low latency (and current index size) doesn’t hurt recall or performance - we just don’t see a visible gain yet. The expectation is that a much larger index and/or higher-latency setting would show a clearer separation and larger compute savings from collaborative search. <img width="2533" height="1187" alt="image" src="https://github.com/user-attachments/assets/7a18be99-1b71-4f5d-817e-0feb4b74df7a" /> Next steps (suggestions welcome, highly encouraged) 1. **Vastly increase index size** (e.g. 10–20×) so the system is actually stressed; we expect much larger collaborative gains, consistent with the savings seen on higher-latency (2.5 Gbit) setups. 2. **Test relevant vs. non‑relevant (but sane) queries** on that larger index to reveal any impact of data voids or query distribution. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
