krickert commented on PR #15676:
URL: https://github.com/apache/lucene/pull/15676#issuecomment-3933936234

   On this low-latency, 16-shard setup, collaborative and independent sit on 
top of each other: recall is the same and total lookups are the same, so the 
plot shows negligible impact of collaborative search here. The takeaway is that 
on a fast machine with modest index size, we don’t see a recall or work 
tradeoff yet.
   
   <img width="1627" height="884" alt="image" 
src="https://github.com/user-attachments/assets/e68c9afe-44d1-4b84-9ceb-3f2c9c2512a3";
 />
   
   This chart shows that pruning does occur (lookups_saved &gt; 0) and that the 
gains are larger when shards do more work (e.g. higher ef or larger index). So 
the mechanism works; the small effect in Chart 1 is because total work per 
query is already small in this environment.
   
   <img width="2077" height="994" alt="image" 
src="https://github.com/user-attachments/assets/eb38907f-4365-4a9a-9ed7-d46d32bcfb74";
 />
   
   The Pareto plot shows the same story: collaborative and independent trace 
the same recall–latency and recall-work curves on this setup, so low latency 
(and current index size) doesn’t hurt recall or performance - we just don’t see 
a visible gain yet. The expectation is that a much larger index and/or 
higher-latency setting would show a clearer separation and larger compute 
savings from collaborative search.
   
   <img width="2533" height="1187" alt="image" 
src="https://github.com/user-attachments/assets/7a18be99-1b71-4f5d-817e-0feb4b74df7a";
 />
   
   Next steps (suggestions welcome, highly encouraged)
   
   1. **Vastly increase index size** (e.g. 10–20×) so the system is actually 
stressed; we expect much larger collaborative gains, consistent with the 
savings seen on higher-latency (2.5 Gbit) setups.
   
   2. **Test relevant vs. non‑relevant (but sane) queries** on that larger 
index to reveal any impact of data voids or query distribution.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to