Chris, FWIW I was looking at luceneknn ( https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/luceneknn/Dockerfile#L15) which is on 9.7, though I don't know if it enabled the incubating vector API at runtime?
I hope that mentioning ANN benchmarks did not add noise to this thread, I was mostly looking at whether I could find another benchmark that suggests that Lucene is significantly slower in similar conditions. Does it align with other people's experience that Lucene is 2x slower or more compared with other good HNSW implementations? Adrien Le jeu. 19 juin 2025, 18:44, Chris Hegarty <christopher.hega...@elastic.co.invalid> a écrit : > Hi Adrien, > > > Even though it uses Elasticsearch to run the benchmark, it really > benchmarks Lucene functionality, > > Agreed. > > > This seems consistent with results from > https://ann-benchmarks.com/index.html though I don't know if the cause of > the performance difference is the same or not. > > On ann-benchmarks specifically. Unless I’m looking in the wrong place, > then they’re using Elasticsearch 8.7.0 [1], which predates our usage of the > Panama Vector API for vector search. We added support for that in Lucene > 9.7.0 -> Elasticsearch 8.9.0. So those benchmarks are wildly out of date, > no ? > > -Chris. > > [1] > https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/elasticsearch/Dockerfile#L2 > > > > On 19 Jun 2025, at 16:39, Adrien Grand <jpou...@gmail.com> wrote: > > > > Hello all, > > > > I have been looking at this benchmark against Vespa recently: > https://blog.vespa.ai/elasticsearch-vs-vespa-performance-comparison/. > (The report is behind an annoying email wall, but I'm copying relevant data > below, so hopefully you don't need to download the report.) Even though it > uses Elasticsearch to run the benchmark, it really benchmarks Lucene > functionality, I don't believe that Elasticsearch does anything that > meaningfully alters the results that you would get if you were to run > Lucene directly. > > > > The benchmark seems designed to highlight the benefits of Vespa's > realtime design, that's fair game I guess. But it also runs some queries in > read-only scenarios when I was expecting Lucene to perform better. > > > > One thing that got me curious is that it reports about 2x worse latency > and throughput for pure unfiltered vector search on a force-merged index > (so single segment/graph). Does anybody know why Lucene's HNSW may perform > slower than Vespa's HNSW? This seems consistent with results from > https://ann-benchmarks.com/index.html though I don't know if the cause of > the performance difference is the same or not. > > > > For reference, here are details that apply to both Lucene and Vespa's > vector search: > > - HNSW, > > - float32 vectors, no quantization, > > - embeddings generated using Snowflake's Arctic-embed-xs model > > - 1M docs > > - 384 dimensions, > > - dot product, > > - m = 16, > > - max connections = 200, > > - search for top 10 hits, > > - no filter, > > - single client, so no search concurrency, > > - purple column is force-merged, so single segment/graph like Vespa. > > > > I never seriously looked at Lucene's vector search performance, so I'm > very happy to be educated if I'm making naive assumptions! > > > > Somewhat related, is this the reason why I'm seeing many threads around > bringing 3rd party implementations into Lucene, including ones that are > very similar to Lucene on paper? To speed up vector search? > > > > -- > > Adrien > > <vespa-vs-es-screenshot.png> > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > > For additional commands, e-mail: dev-h...@lucene.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >