Re: Do we know why Lucene's HNSW may be slower than other HNSW implementations?

Adrien Grand Thu, 19 Jun 2025 12:37:37 -0700

Thanks Mike, this is useful information. Then I'll try to reproduce this
benchmark to better understand what is happening.


On Thu, Jun 19, 2025 at 8:16 PM Michael Sokolov <[email protected]> wrote:

> We've recently been comparing Lucene's HNSW w/FAISS' and there is not
> a 2x difference there. FAISS does seem to be around 10-15% faster I
> think?  The 2x difference is roughly what I was seeing in comparisons
> w/hnswlib prior to the dot-product improvements we made in Lucene.
>
> On Thu, Jun 19, 2025 at 2:12 PM Adrien Grand <[email protected]> wrote:
> >
> > Chris,
> >
> > FWIW I was looking at luceneknn (
> https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/luceneknn/Dockerfile#L15)
> which is on 9.7, though I don't know if it enabled the incubating vector
> API at runtime?
> >
> > I hope that mentioning ANN benchmarks did not add noise to this thread,
> I was mostly looking at whether I could find another benchmark that
> suggests that Lucene is significantly slower in similar conditions. Does it
> align with other people's experience that Lucene is 2x slower or more
> compared with other good HNSW implementations?
> >
> > Adrien
> >
> > Le jeu. 19 juin 2025, 18:44, Chris Hegarty
> <[email protected]> a écrit :
> >>
> >> Hi Adrien,
> >>
> >> > Even though it uses Elasticsearch to run the benchmark, it really
> benchmarks Lucene functionality,
> >>
> >> Agreed.
> >>
> >> > This seems consistent with results from
> https://ann-benchmarks.com/index.html though I don't know if the cause of
> the performance difference is the same or not.
> >>
> >> On ann-benchmarks specifically. Unless I’m looking in the wrong place,
> then they’re using Elasticsearch 8.7.0 [1], which predates our usage of the
> Panama Vector API for vector search. We added support for that in Lucene
> 9.7.0 -> Elasticsearch 8.9.0.  So those benchmarks are wildly out of date,
> no ?
> >>
> >> -Chris.
> >>
> >> [1]
> https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/elasticsearch/Dockerfile#L2
> >>
> >>
> >> > On 19 Jun 2025, at 16:39, Adrien Grand <[email protected]> wrote:
> >> >
> >> > Hello all,
> >> >
> >> > I have been looking at this benchmark against Vespa recently:
> https://blog.vespa.ai/elasticsearch-vs-vespa-performance-comparison/.
> (The report is behind an annoying email wall, but I'm copying relevant data
> below, so hopefully you don't need to download the report.) Even though it
> uses Elasticsearch to run the benchmark, it really benchmarks Lucene
> functionality, I don't believe that Elasticsearch does anything that
> meaningfully alters the results that you would get if you were to run
> Lucene directly.
> >> >
> >> > The benchmark seems designed to highlight the benefits of Vespa's
> realtime design, that's fair game I guess. But it also runs some queries in
> read-only scenarios when I was expecting Lucene to perform better.
> >> >
> >> > One thing that got me curious is that it reports about 2x worse
> latency and throughput for pure unfiltered vector search on a force-merged
> index (so single segment/graph). Does anybody know why Lucene's HNSW may
> perform slower than Vespa's HNSW? This seems consistent with results from
> https://ann-benchmarks.com/index.html though I don't know if the cause of
> the performance difference is the same or not.
> >> >
> >> > For reference, here are details that apply to both Lucene and Vespa's
> vector search:
> >> >  - HNSW,
> >> >  - float32 vectors, no quantization,
> >> >  - embeddings generated using  Snowflake's Arctic-embed-xs model
> >> >  - 1M docs
> >> >  - 384 dimensions,
> >> >  - dot product,
> >> >  - m = 16,
> >> >  - max connections = 200,
> >> >  - search for top 10 hits,
> >> >  - no filter,
> >> >  - single client, so no search concurrency,
> >> >  - purple column is force-merged, so single segment/graph like Vespa.
> >> >
> >> > I never seriously looked at Lucene's vector search performance, so
> I'm very happy to be educated if I'm making naive assumptions!
> >> >
> >> > Somewhat related, is this the reason why I'm seeing many threads
> around bringing 3rd party implementations into Lucene, including ones that
> are very similar to Lucene on paper? To speed up vector search?
> >> >
> >> > --
> >> > Adrien
> >> > <vespa-vs-es-screenshot.png>
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [email protected]
> >> > For additional commands, e-mail: [email protected]
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

-- 
Adrien

Re: Do we know why Lucene's HNSW may be slower than other HNSW implementations?

Reply via email to