Hey,

Very nice article! Looks like lots of manual work to look at search results in 
those examples. Great work!

Do you have a DOI name for the article?

Uwe

Am 1. September 2023 07:22:09 MESZ schrieb Kent Fitch <kent.fi...@gmail.com>:
>My testing shows Lucene's HNSW in a very positive light.  The ability to
>perform blended searches (vector/semantic and text) is valuable, even with
>high quality embeddings, and helps when the searcher's intent is to search
>for specific words or phrases (such as a name, or exact concepts) which get
>blurred-out by semantics.   I discussed blended searching using Lucene in
>this Code4Lib article: https://journal.code4lib.org/articles/17443
>
>And regarding performance, I have benchmarked Lucene's HNSW (circa Jan2023
>snapshot) on a test index of 192 million vectors of 1536 dimensions,
>reduced by PQ coding to 512 bytes and stored in HNSW.  Building this index
>was slow (lots of time merging...) but once it was built, it did fit
>entirely in memory (core i7-9800x (8 cores) with 128gb DDR4 memory running
>at 2400 MT/s) so no IO was required at search time.  (I modified the lucene
>similarity code to support expansion of each of the 512 PQ byte codes back
>to 3 floats for the distance calculation.)  I havent updated this to take
>advantage of the latest SIMD capability, but even so, once the HNSW
>structure is in memory, a single-threaded topK=10 search thread achieves
>2.4 queries/second.  Two threads: 4.9 q/s, 4 threads: 7.2q/s, maxing out at
>8 threads: 9.4 q/s.  I guess the non-linear scaling with threads is due to
>competition for memory bandwidth and cache.  Curiously, I'm not getting
>nearly as good performance out of the box using Milvus 2.3's diskANN, but I
>need to find out why before condemning it.
>
>Kent Fitch
>
>On Thu, Aug 31, 2023 at 7:53 PM Michael McCandless <
>luc...@mikemccandless.com> wrote:
>
>> Thanks Michael, very interesting!  I of course agree that Lucene is all
>> you need, heh ;)
>>
>> Jimmy Lin also tweeted about the strength of Lucene's HNSW:
>> https://twitter.com/lintool/status/1681333664431460353?s=20
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Aug 31, 2023 at 3:31 AM Michael Wechner <michael.wech...@wyona.com>
>> wrote:
>>
>>> Hi Together
>>>
>>> You might be interesed in this paper / article
>>>
>>> https://arxiv.org/abs/2308.14963
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

Reply via email to