Hey, Very nice article! Looks like lots of manual work to look at search results in those examples. Great work!
Do you have a DOI name for the article? Uwe Am 1. September 2023 07:22:09 MESZ schrieb Kent Fitch <kent.fi...@gmail.com>: >My testing shows Lucene's HNSW in a very positive light. The ability to >perform blended searches (vector/semantic and text) is valuable, even with >high quality embeddings, and helps when the searcher's intent is to search >for specific words or phrases (such as a name, or exact concepts) which get >blurred-out by semantics. I discussed blended searching using Lucene in >this Code4Lib article: https://journal.code4lib.org/articles/17443 > >And regarding performance, I have benchmarked Lucene's HNSW (circa Jan2023 >snapshot) on a test index of 192 million vectors of 1536 dimensions, >reduced by PQ coding to 512 bytes and stored in HNSW. Building this index >was slow (lots of time merging...) but once it was built, it did fit >entirely in memory (core i7-9800x (8 cores) with 128gb DDR4 memory running >at 2400 MT/s) so no IO was required at search time. (I modified the lucene >similarity code to support expansion of each of the 512 PQ byte codes back >to 3 floats for the distance calculation.) I havent updated this to take >advantage of the latest SIMD capability, but even so, once the HNSW >structure is in memory, a single-threaded topK=10 search thread achieves >2.4 queries/second. Two threads: 4.9 q/s, 4 threads: 7.2q/s, maxing out at >8 threads: 9.4 q/s. I guess the non-linear scaling with threads is due to >competition for memory bandwidth and cache. Curiously, I'm not getting >nearly as good performance out of the box using Milvus 2.3's diskANN, but I >need to find out why before condemning it. > >Kent Fitch > >On Thu, Aug 31, 2023 at 7:53 PM Michael McCandless < >luc...@mikemccandless.com> wrote: > >> Thanks Michael, very interesting! I of course agree that Lucene is all >> you need, heh ;) >> >> Jimmy Lin also tweeted about the strength of Lucene's HNSW: >> https://twitter.com/lintool/status/1681333664431460353?s=20 >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> >> On Thu, Aug 31, 2023 at 3:31 AM Michael Wechner <michael.wech...@wyona.com> >> wrote: >> >>> Hi Together >>> >>> You might be interesed in this paper / article >>> >>> https://arxiv.org/abs/2308.14963 >>> >>> Thanks >>> >>> Michael >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: dev-h...@lucene.apache.org >>> >>> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de