Thanks for sharing, Michael. But can't we say that vector DBs may utilize GPUs that are hardly possible with Lucene now?
On Fri, Sep 1, 2023 at 8:24 AM Kent Fitch <kent.fi...@gmail.com> wrote: > My testing shows Lucene's HNSW in a very positive light. The ability to > perform blended searches (vector/semantic and text) is valuable, even with > high quality embeddings, and helps when the searcher's intent is to search > for specific words or phrases (such as a name, or exact concepts) which get > blurred-out by semantics. I discussed blended searching using Lucene in > this Code4Lib article: https://journal.code4lib.org/articles/17443 > > And regarding performance, I have benchmarked Lucene's HNSW (circa Jan2023 > snapshot) on a test index of 192 million vectors of 1536 dimensions, > reduced by PQ coding to 512 bytes and stored in HNSW. Building this index > was slow (lots of time merging...) but once it was built, it did fit > entirely in memory (core i7-9800x (8 cores) with 128gb DDR4 memory running > at 2400 MT/s) so no IO was required at search time. (I modified the lucene > similarity code to support expansion of each of the 512 PQ byte codes back > to 3 floats for the distance calculation.) I havent updated this to take > advantage of the latest SIMD capability, but even so, once the HNSW > structure is in memory, a single-threaded topK=10 search thread achieves > 2.4 queries/second. Two threads: 4.9 q/s, 4 threads: 7.2q/s, maxing out at > 8 threads: 9.4 q/s. I guess the non-linear scaling with threads is due to > competition for memory bandwidth and cache. Curiously, I'm not getting > nearly as good performance out of the box using Milvus 2.3's diskANN, but I > need to find out why before condemning it. > > Kent Fitch > > On Thu, Aug 31, 2023 at 7:53 PM Michael McCandless < > luc...@mikemccandless.com> wrote: > >> Thanks Michael, very interesting! I of course agree that Lucene is all >> you need, heh ;) >> >> Jimmy Lin also tweeted about the strength of Lucene's HNSW: >> https://twitter.com/lintool/status/1681333664431460353?s=20 >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Thu, Aug 31, 2023 at 3:31 AM Michael Wechner < >> michael.wech...@wyona.com> wrote: >> >>> Hi Together >>> >>> You might be interesed in this paper / article >>> >>> https://arxiv.org/abs/2308.14963 >>> >>> Thanks >>> >>> Michael >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> -- Sincerely yours Mikhail Khludnev