krickert commented on PR #15676: URL: https://github.com/apache/lucene/pull/15676#issuecomment-3917523705
@benwtrent @navneet1v - quick status update on a distributed KNN prototype. I implemented a gRPC/HTTP2 streaming coordinator + shard model for collaborative HNSW search (outside OpenSearch REST for now), and ran initial benchmarks on 8 Raspberry Pi 5 nodes (NVMe) plus local runs. ### Early results | Metric | Observation | |--------|-------------| | **Recall** | Matched standard lucene jar baseline in tested runs | | **Node visits** | Reduced by ~50% on typical queries | | **End-to-end latency** | Improved by ~40–50% in the same scenarios | In a heterogeneous setup, adding one higher-performance node improved global pruning and produced larger gains (up to ~65% vs the same cluster without that node in current tests). ### Repo [ai-pipestream/distributed-search](https://github.com/ai-pipestream/distributed-search) - grpc streaming service PoC ### Next steps - Preparing a fuller write-up with automated/reproducible runs - Current coverage includes K up to 5000 - Next evaluation target: larger-data (100+GB) with large K values Note: this needed an HTTP2 boost - it will not be fast if it's done on HTTP1. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
