Re: [PR] Feature/collaborative hnsw search [lucene]

via GitHub Tue, 17 Feb 2026 15:03:32 -0800


krickert commented on PR #15676:
URL: https://github.com/apache/lucene/pull/15676#issuecomment-3917523705


   @benwtrent @navneet1v - quick status update on a distributed KNN prototype.
   
   I implemented a gRPC/HTTP2 streaming coordinator + shard model for 
collaborative HNSW search (outside OpenSearch REST for now), and ran initial 
benchmarks on 8 Raspberry Pi 5 nodes (NVMe) plus local runs.
   
   ### Early results
   
   | Metric | Observation |
   |--------|-------------|
   | **Recall** | Matched standard lucene jar baseline in tested runs |
   | **Node visits** | Reduced by ~50% on typical queries |
   | **End-to-end latency** | Improved by ~40–50% in the same scenarios |
   
   In a heterogeneous setup, adding one higher-performance node improved global 
pruning and produced larger gains (up to ~65% vs the same cluster without that 
node in current tests).
   
   ### Repo
   
   
[ai-pipestream/distributed-search](https://github.com/ai-pipestream/distributed-search)
 - grpc streaming service PoC
   
   ### Next steps
   
   - Preparing a fuller write-up with automated/reproducible runs
   - Current coverage includes K up to 5000
   - Next evaluation target: larger-data (100+GB) with large K values
   
   Note: this needed an HTTP2 boost - it will not be fast if it's done on HTTP1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Feature/collaborative hnsw search [lucene]

Reply via email to