krickert commented on PR #15676: URL: https://github.com/apache/lucene/pull/15676#issuecomment-3890667776
# Distributed Validation Progress: Collaborative HNSW Pruning So testing is now transitioning from single-JVM simulations to a physical 8-node cluster (2.5GbE, 1TB NVMe, 16GB RAM per node) to isolate variables like CPU contention and IO over-caching. I've discovered that single-host simulations masked a "Coherence Tax" of distributed HNSW search. So testing on real hardware over a LAN is the only way to measure the trade-off between coordination overhead and compute savings. Initial tests are showing slightly slower times in a distributed environment if simulated on a local system - this isn't adding the correct latency we would see IRL. To finally confirm if this implementation is right, it's easier to just make a distributed search PoC that runs on an http2 stream. 100% of this work will be in the available in the [lucene-test-data](https://github.com/ai-pipestream/lucene-test-data/) repo, which can recreate everything up until the shard distribution. --- ## The Theory Revisited So the theory is sound - but I've not been able to successfully demonstrate the savings because the tests were flawed or the idea is just too much overhead to hold water against. In a standard sharded HNSW search, every shard performs a full graph traversal to find its local top-K, unaware that its candidates may be far below the global similarity threshold. This results in significant redundant compute, especially for high-K queries ($K \ge 1000$). We want to eliminate these redundant cycles. **Collaborative Pruning** introduces a global minimum similarity bar synchronized across nodes via HTTP/2 streams. By injecting this bar into the `searchLevel` loops of the HNSW graph searcher, we can trigger early termination of graph traversals that have zero probability of entering the final global result set. --- ## Benchmarking Dimensions & Stats To prove this, here's a chart of data I'll be collecting: | Dimension | Values | |---|---| | **Result Set Size (K)** | 3, 50, 100, 200, 500, 1,000, 5,000 | | **Shard Distribution** | 1, 2, 4, 8, 16 shards | | **Data Density** | Diverse corpus (Wikipedia, Literature, Technical Papers) embedded via BGE-M3 | ### Baselines - **Recall:** True KNN (brute-force exact search) per query, used to measure recall@K for both standard and collaborative modes. - **Performance Baseline:** Standard Lucene HNSW search (no collaborative pruning) across the same K x shard matrix, providing the control for all latency and node-visit deltas. ### Key Metrics | Metric | Purpose | |---|---| | **Recall@K vs. True KNN** | Ensuring collaborative pruning does not degrade result quality relative to exact search | | **HNSW Node Visit Delta** | Reduction in total graph nodes explored per query, the "pure" algorithmic win | | **Tail Latency (P99)** | Validating if collaborative pruning mitigates "heavy" query outliers | | **Standard Deviation of Work** | How effectively the global bar balances load in "skewed" data scenarios | I'll try to infer the collaborative overhead, but comparing to a standard baseline and measuring latency should be good enough to demonstrate. --- ## Execution Plan 1. **Staging:** Python-based embedding generation using BGE-M3, followed by a deduplication step to ensure a clean corpus. 2. **Indexing:** A 16-worker process generates 16 base shards, which are then merged into 8, 4, and 2-shard indices for comparative testing. 3. **Service:** A gRPC-based symmetric peer architecture utilizing ScaleCube for gossip-style node discovery. 4. **Simulation:** Nodes will be warmed (OS page cache) to ensure we are measuring the algorithm's efficiency rather than raw disk IO performance. --- ## System Architecture ```mermaid graph TD subgraph "Offline Preparation" A[Raw Text Data] -->|BGE-M3 Python| B[.vec Embeddings] B -->|Indexer Utility| C[16 Shards] C -->|Merge| D[8 / 4 / 2 / 1 Shard Indices] end subgraph "Distributed Cluster: 8 Nodes" E[Search Request] --> Node0((Node 0: Coordinator)) subgraph "Symmetric Peer Node" direction TB N((Peer Node)) --> Local[Lucene HNSW Shard] Local <--> Collab[CollaborativeKnnCollector] Collab <--> Gossip{{ScaleCube / gRPC Stream}} end Node0 -->|gRPC Search| Node1(Node 1) Node0 -->|gRPC Search| Node2(Node 2) Node0 -->|gRPC Search| NodeN(...) Node1 <-->|Bi-Di Threshold Sync| Node0 Node2 <-->|Bi-Di Threshold Sync| Node1 NodeN <-->|Bi-Di Threshold Sync| Node0 end subgraph "Merging" Node0 -->|Fan-in Results| Merge[Heap Merge & Sort] Merge --> Final[Top K Results] end ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
