We've recently been comparing Lucene's HNSW w/FAISS' and there is not a 2x difference there. FAISS does seem to be around 10-15% faster I think? The 2x difference is roughly what I was seeing in comparisons w/hnswlib prior to the dot-product improvements we made in Lucene.
On Thu, Jun 19, 2025 at 2:12 PM Adrien Grand <jpou...@gmail.com> wrote: > > Chris, > > FWIW I was looking at luceneknn > (https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/luceneknn/Dockerfile#L15) > which is on 9.7, though I don't know if it enabled the incubating vector API > at runtime? > > I hope that mentioning ANN benchmarks did not add noise to this thread, I was > mostly looking at whether I could find another benchmark that suggests that > Lucene is significantly slower in similar conditions. Does it align with > other people's experience that Lucene is 2x slower or more compared with > other good HNSW implementations? > > Adrien > > Le jeu. 19 juin 2025, 18:44, Chris Hegarty > <christopher.hega...@elastic.co.invalid> a écrit : >> >> Hi Adrien, >> >> > Even though it uses Elasticsearch to run the benchmark, it really >> > benchmarks Lucene functionality, >> >> Agreed. >> >> > This seems consistent with results from >> > https://ann-benchmarks.com/index.html though I don't know if the cause of >> > the performance difference is the same or not. >> >> On ann-benchmarks specifically. Unless I’m looking in the wrong place, then >> they’re using Elasticsearch 8.7.0 [1], which predates our usage of the >> Panama Vector API for vector search. We added support for that in Lucene >> 9.7.0 -> Elasticsearch 8.9.0. So those benchmarks are wildly out of date, >> no ? >> >> -Chris. >> >> [1] >> https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/elasticsearch/Dockerfile#L2 >> >> >> > On 19 Jun 2025, at 16:39, Adrien Grand <jpou...@gmail.com> wrote: >> > >> > Hello all, >> > >> > I have been looking at this benchmark against Vespa recently: >> > https://blog.vespa.ai/elasticsearch-vs-vespa-performance-comparison/. (The >> > report is behind an annoying email wall, but I'm copying relevant data >> > below, so hopefully you don't need to download the report.) Even though it >> > uses Elasticsearch to run the benchmark, it really benchmarks Lucene >> > functionality, I don't believe that Elasticsearch does anything that >> > meaningfully alters the results that you would get if you were to run >> > Lucene directly. >> > >> > The benchmark seems designed to highlight the benefits of Vespa's realtime >> > design, that's fair game I guess. But it also runs some queries in >> > read-only scenarios when I was expecting Lucene to perform better. >> > >> > One thing that got me curious is that it reports about 2x worse latency >> > and throughput for pure unfiltered vector search on a force-merged index >> > (so single segment/graph). Does anybody know why Lucene's HNSW may perform >> > slower than Vespa's HNSW? This seems consistent with results from >> > https://ann-benchmarks.com/index.html though I don't know if the cause of >> > the performance difference is the same or not. >> > >> > For reference, here are details that apply to both Lucene and Vespa's >> > vector search: >> > - HNSW, >> > - float32 vectors, no quantization, >> > - embeddings generated using Snowflake's Arctic-embed-xs model >> > - 1M docs >> > - 384 dimensions, >> > - dot product, >> > - m = 16, >> > - max connections = 200, >> > - search for top 10 hits, >> > - no filter, >> > - single client, so no search concurrency, >> > - purple column is force-merged, so single segment/graph like Vespa. >> > >> > I never seriously looked at Lucene's vector search performance, so I'm >> > very happy to be educated if I'm making naive assumptions! >> > >> > Somewhat related, is this the reason why I'm seeing many threads around >> > bringing 3rd party implementations into Lucene, including ones that are >> > very similar to Lucene on paper? To speed up vector search? >> > >> > -- >> > Adrien >> > <vespa-vs-es-screenshot.png> >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: dev-h...@lucene.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org