krickert commented on PR #15676:
URL: https://github.com/apache/lucene/pull/15676#issuecomment-3890667776

   # Distributed Validation Progress: Collaborative HNSW Pruning
   
   So testing is now transitioning from single-JVM simulations to a physical 
8-node cluster (2.5GbE, 1TB NVMe, 16GB RAM per node) to isolate variables like 
CPU contention and IO over-caching. I've discovered that single-host 
simulations masked a "Coherence Tax" of distributed HNSW search.  So testing on 
real hardware over a LAN is the only way to  measure the trade-off between 
coordination overhead and compute savings.
   
   Initial tests are showing slightly slower times in a distributed environment 
if simulated on a local system - this isn't adding the correct latency we would 
see IRL.   To finally confirm if this implementation is right, it's easier to 
just make a distributed search PoC that runs on an http2 stream.
   
   100% of this work will be in the available in the 
[lucene-test-data](https://github.com/ai-pipestream/lucene-test-data/) repo, 
which can recreate everything up until the shard distribution.
   
   ---
   
   ## The Theory Revisited
   
   So the theory is sound - but I've not been able to successfully demonstrate 
the savings because the tests were flawed or the idea is just too much overhead 
to hold water against.
   
   In a standard sharded HNSW search, every shard performs a full graph 
traversal to find its local top-K, unaware that its candidates may be far below 
the global similarity threshold. This results in significant redundant compute, 
especially for high-K queries ($K \ge 1000$).
   
   We want to eliminate these redundant cycles.
   
   **Collaborative Pruning** introduces a global minimum similarity bar 
synchronized across nodes via HTTP/2 streams. By injecting this bar into the 
`searchLevel` loops of the HNSW graph searcher, we can trigger early 
termination of graph traversals that have zero probability of entering the 
final global result set.
   
   ---
   
   ## Benchmarking Dimensions & Stats
   
   To prove this, here's a chart of data I'll be collecting:
   
   | Dimension | Values |
   |---|---|
   | **Result Set Size (K)** | 3, 50, 100, 200, 500, 1,000, 5,000 |
   | **Shard Distribution** | 1, 2, 4, 8, 16 shards |
   | **Data Density** | Diverse corpus (Wikipedia, Literature, Technical 
Papers) embedded via BGE-M3 |
   
   ### Baselines
   
   - **Recall:** True KNN (brute-force exact search) per query, used to measure 
recall@K for both standard and collaborative modes.
   - **Performance Baseline:** Standard Lucene HNSW search (no collaborative 
pruning) across the same K x shard matrix, providing the control for all 
latency and node-visit deltas.
   
   ### Key Metrics
   
   | Metric | Purpose |
   |---|---|
   | **Recall@K vs. True KNN** | Ensuring collaborative pruning does not 
degrade result quality relative to exact search |
   | **HNSW Node Visit Delta** | Reduction in total graph nodes explored per 
query, the "pure" algorithmic win |
   | **Tail Latency (P99)** | Validating if collaborative pruning mitigates 
"heavy" query outliers |
   | **Standard Deviation of Work** | How effectively the global bar balances 
load in "skewed" data scenarios |
   
   I'll try to infer the collaborative overhead, but comparing to a standard 
baseline and measuring latency should be good enough to demonstrate.
   
   ---
   
   ## Execution Plan
   
   1. **Staging:** Python-based embedding generation using BGE-M3, followed by 
a deduplication step to ensure a clean corpus.
   2. **Indexing:** A 16-worker process generates 16 base shards, which are 
then merged into 8, 4, and 2-shard indices for comparative testing.
   3. **Service:** A gRPC-based symmetric peer architecture utilizing ScaleCube 
for gossip-style node discovery.
   4. **Simulation:** Nodes will be warmed (OS page cache) to ensure we are 
measuring the algorithm's efficiency rather than raw disk IO performance.
   
   ---
   
   ## System Architecture
   
   ```mermaid
   graph TD
       subgraph "Offline Preparation"
           A[Raw Text Data] -->|BGE-M3 Python| B[.vec Embeddings]
           B -->|Indexer Utility| C[16 Shards]
           C -->|Merge| D[8 / 4 / 2 / 1 Shard Indices]
       end
   
       subgraph "Distributed Cluster: 8 Nodes"
           E[Search Request] --> Node0((Node 0: Coordinator))
   
           subgraph "Symmetric Peer Node"
               direction TB
               N((Peer Node)) --> Local[Lucene HNSW Shard]
               Local <--> Collab[CollaborativeKnnCollector]
               Collab <--> Gossip{{ScaleCube / gRPC Stream}}
           end
   
           Node0 -->|gRPC Search| Node1(Node 1)
           Node0 -->|gRPC Search| Node2(Node 2)
           Node0 -->|gRPC Search| NodeN(...)
   
           Node1 <-->|Bi-Di Threshold Sync| Node0
           Node2 <-->|Bi-Di Threshold Sync| Node1
           NodeN <-->|Bi-Di Threshold Sync| Node0
       end
   
       subgraph "Merging"
           Node0 -->|Fan-in Results| Merge[Heap Merge & Sort]
           Merge --> Final[Top K Results]
       end
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to