krickert commented on PR #15676: URL: https://github.com/apache/lucene/pull/15676#issuecomment-3875355847
I've been digging into the recall issues from the distributed simulations (4, 8, and 16 shards, 1.47M 1024-dim vectors). Rerunning on clean, deduped data and instrumenting per-shard behavior has uncovered a few difficult to figure out problems: 1. **Entry Point Protection - a lag threshold:** A high global bar arriving early in a search can prune a shard's HNSW entry point before it ever reaches its local high-similarity cluster. Current workaround: a constant-node guard (first 100 nodes) to let every shard get into its query neighborhood before global pruning kicks in. It works, but it's a blunt instrument and took away the speed improvement I had hoped for. 2. **Tie-Break Paralysis:** The original docBase safety logic was too restrictive in multi-shard environments, effectively disabling pruning for shards with lower IDs. I've shifted to prioritizing pruning leverage with a safety slack (0.01f) for floating-point jitter, though I don't think this holds up. 3. **Coherence Contention:** High-frequency volatile reads of the global bar in the HNSW hot-loop were creating memory bus contention the multi-core system I'm coding on. I changed HnswGraphSearcher to help - but that's far too invasive and will continue to avoid changing core classes. 4. **Recall Recovery:** With some fixes, the K=100 recall now matches baseline (0.806 vs 0.796) on deduped data, and K=10 has recovered from 0.31 to 0.66. Better, but still... bad. The parameters that scale with K (especially for K >= 1000) aren't as straightforward as I had thought. Still working through it and open to ideas if anyone sees a cleaner approach. In the meantime, I will attempt a more realistic approach and create a per-index HTTP2 service that serves up lucene to see if real-network collaborative pruning can work. More to come... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
