krickert commented on PR #15676:
URL: https://github.com/apache/lucene/pull/15676#issuecomment-3875355847

   I've been digging into the recall issues from the distributed simulations 
(4, 8, and 16 shards, 1.47M 1024-dim vectors). Rerunning on clean, deduped data 
and instrumenting per-shard behavior has uncovered a few difficult to figure 
out problems:
   
   1. **Entry Point Protection - a lag threshold:** A high global bar arriving 
early in a search can prune a shard's HNSW entry point before it ever reaches 
its local high-similarity cluster. Current workaround: a constant-node guard 
(first 100 nodes) to let every shard get into its query neighborhood before 
global pruning kicks in. It works, but it's a blunt instrument and took away 
the speed improvement I had hoped for.
   
   2. **Tie-Break Paralysis:** The original docBase safety logic was too 
restrictive in multi-shard environments, effectively disabling pruning for 
shards with lower IDs. I've shifted to prioritizing pruning leverage with a 
safety slack (0.01f) for floating-point jitter, though I don't think this holds 
up.
   
   3. **Coherence Contention:** High-frequency volatile reads of the global bar 
in the HNSW hot-loop were creating memory bus contention the multi-core system 
I'm coding on.  I changed HnswGraphSearcher to help - but that's far too 
invasive and will continue to avoid changing core classes.
   
   4. **Recall Recovery:** With some fixes, the K=100 recall now matches 
baseline (0.806 vs 0.796) on deduped data, and K=10 has recovered from 0.31 to 
0.66. Better, but still... bad.
   
   The parameters that scale with K (especially for K >= 1000) aren't as 
straightforward as I had thought. Still working through it and open to ideas if 
anyone sees a cleaner approach.
   
   In the meantime, I will attempt a more realistic approach and create a 
per-index HTTP2 service that serves up lucene to see if real-network 
collaborative pruning can work. 
   
   More to come... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to