Thanks for sharing, Michael.
But can't we say that vector DBs may utilize GPUs that are hardly possible
with Lucene now?

On Fri, Sep 1, 2023 at 8:24 AM Kent Fitch <kent.fi...@gmail.com> wrote:

> My testing shows Lucene's HNSW in a very positive light.  The ability to
> perform blended searches (vector/semantic and text) is valuable, even with
> high quality embeddings, and helps when the searcher's intent is to search
> for specific words or phrases (such as a name, or exact concepts) which get
> blurred-out by semantics.   I discussed blended searching using Lucene in
> this Code4Lib article: https://journal.code4lib.org/articles/17443
>
> And regarding performance, I have benchmarked Lucene's HNSW (circa Jan2023
> snapshot) on a test index of 192 million vectors of 1536 dimensions,
> reduced by PQ coding to 512 bytes and stored in HNSW.  Building this index
> was slow (lots of time merging...) but once it was built, it did fit
> entirely in memory (core i7-9800x (8 cores) with 128gb DDR4 memory running
> at 2400 MT/s) so no IO was required at search time.  (I modified the lucene
> similarity code to support expansion of each of the 512 PQ byte codes back
> to 3 floats for the distance calculation.)  I havent updated this to take
> advantage of the latest SIMD capability, but even so, once the HNSW
> structure is in memory, a single-threaded topK=10 search thread achieves
> 2.4 queries/second.  Two threads: 4.9 q/s, 4 threads: 7.2q/s, maxing out at
> 8 threads: 9.4 q/s.  I guess the non-linear scaling with threads is due to
> competition for memory bandwidth and cache.  Curiously, I'm not getting
> nearly as good performance out of the box using Milvus 2.3's diskANN, but I
> need to find out why before condemning it.
>
> Kent Fitch
>
> On Thu, Aug 31, 2023 at 7:53 PM Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Thanks Michael, very interesting!  I of course agree that Lucene is all
>> you need, heh ;)
>>
>> Jimmy Lin also tweeted about the strength of Lucene's HNSW:
>> https://twitter.com/lintool/status/1681333664431460353?s=20
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Aug 31, 2023 at 3:31 AM Michael Wechner <
>> michael.wech...@wyona.com> wrote:
>>
>>> Hi Together
>>>
>>> You might be interesed in this paper / article
>>>
>>> https://arxiv.org/abs/2308.14963
>>>
>>> Thanks
>>>
>>> Michael
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>
>>>

-- 
Sincerely yours
Mikhail Khludnev

Reply via email to