Re: Do we know why Lucene's HNSW may be slower than other HNSW implementations?

Michael Sokolov Thu, 19 Jun 2025 11:16:48 -0700

We've recently been comparing Lucene's HNSW w/FAISS' and there is not
a 2x difference there. FAISS does seem to be around 10-15% faster I
think?  The 2x difference is roughly what I was seeing in comparisons
w/hnswlib prior to the dot-product improvements we made in Lucene.


On Thu, Jun 19, 2025 at 2:12 PM Adrien Grand <[email protected]> wrote:
>
> Chris,
>
> FWIW I was looking at luceneknn 
> (https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/luceneknn/Dockerfile#L15)
>  which is on 9.7, though I don't know if it enabled the incubating vector API 
> at runtime?
>
> I hope that mentioning ANN benchmarks did not add noise to this thread, I was 
> mostly looking at whether I could find another benchmark that suggests that 
> Lucene is significantly slower in similar conditions. Does it align with 
> other people's experience that Lucene is 2x slower or more compared with 
> other good HNSW implementations?
>
> Adrien
>
> Le jeu. 19 juin 2025, 18:44, Chris Hegarty 
> <[email protected]> a écrit :
>>
>> Hi Adrien,
>>
>> > Even though it uses Elasticsearch to run the benchmark, it really 
>> > benchmarks Lucene functionality,
>>
>> Agreed.
>>
>> > This seems consistent with results from 
>> > https://ann-benchmarks.com/index.html though I don't know if the cause of 
>> > the performance difference is the same or not.
>>
>> On ann-benchmarks specifically. Unless I’m looking in the wrong place, then 
>> they’re using Elasticsearch 8.7.0 [1], which predates our usage of the 
>> Panama Vector API for vector search. We added support for that in Lucene 
>> 9.7.0 -> Elasticsearch 8.9.0.  So those benchmarks are wildly out of date, 
>> no ?
>>
>> -Chris.
>>
>> [1] 
>> https://github.com/erikbern/ann-benchmarks/blob/f402b2cc17b980d7cd45241ab5a7a4cc0f965e55/ann_benchmarks/algorithms/elasticsearch/Dockerfile#L2
>>
>>
>> > On 19 Jun 2025, at 16:39, Adrien Grand <[email protected]> wrote:
>> >
>> > Hello all,
>> >
>> > I have been looking at this benchmark against Vespa recently: 
>> > https://blog.vespa.ai/elasticsearch-vs-vespa-performance-comparison/. (The 
>> > report is behind an annoying email wall, but I'm copying relevant data 
>> > below, so hopefully you don't need to download the report.) Even though it 
>> > uses Elasticsearch to run the benchmark, it really benchmarks Lucene 
>> > functionality, I don't believe that Elasticsearch does anything that 
>> > meaningfully alters the results that you would get if you were to run 
>> > Lucene directly.
>> >
>> > The benchmark seems designed to highlight the benefits of Vespa's realtime 
>> > design, that's fair game I guess. But it also runs some queries in 
>> > read-only scenarios when I was expecting Lucene to perform better.
>> >
>> > One thing that got me curious is that it reports about 2x worse latency 
>> > and throughput for pure unfiltered vector search on a force-merged index 
>> > (so single segment/graph). Does anybody know why Lucene's HNSW may perform 
>> > slower than Vespa's HNSW? This seems consistent with results from 
>> > https://ann-benchmarks.com/index.html though I don't know if the cause of 
>> > the performance difference is the same or not.
>> >
>> > For reference, here are details that apply to both Lucene and Vespa's 
>> > vector search:
>> >  - HNSW,
>> >  - float32 vectors, no quantization,
>> >  - embeddings generated using  Snowflake's Arctic-embed-xs model
>> >  - 1M docs
>> >  - 384 dimensions,
>> >  - dot product,
>> >  - m = 16,
>> >  - max connections = 200,
>> >  - search for top 10 hits,
>> >  - no filter,
>> >  - single client, so no search concurrency,
>> >  - purple column is force-merged, so single segment/graph like Vespa.
>> >
>> > I never seriously looked at Lucene's vector search performance, so I'm 
>> > very happy to be educated if I'm making naive assumptions!
>> >
>> > Somewhat related, is this the reason why I'm seeing many threads around 
>> > bringing 3rd party implementations into Lucene, including ones that are 
>> > very similar to Lucene on paper? To speed up vector search?
>> >
>> > --
>> > Adrien
>> > <vespa-vs-es-screenshot.png>
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [email protected]
>> > For additional commands, e-mail: [email protected]
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Do we know why Lucene's HNSW may be slower than other HNSW implementations?

Reply via email to