Re: HNSW questions

Michael Sokolov Wed, 19 Apr 2023 13:37:35 -0700

That class is intended for use by the Lucene index writer - it's not
designed as a general purpose class for re-use outside that context.
And IndexWriter writes documents to disk in bulk.


On Wed, Apr 19, 2023 at 3:54 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>
> Thanks, Michael!
>
> Looking at the paper by Malkov and Yashunin, it looks like the algorithm 
> allows for building the hnsw graph incrementally.  Why does our 
> implementation require specifying all the vectors up front to 
> HnswGraphBuilder.create?
>
> On Wed, Apr 19, 2023 at 3:04 AM Michael Sokolov <msoko...@gmail.com> wrote:
>>
>> These vector values have internal buffers they use to return the vectors. In 
>> order to compare two vectors we need to use two independent sources so that 
>> one doesn't overwrite this internal state when fetching the second vector.
>>
>> Sorry I forgot the second question and can't see it on my phone. Brb
>>
>> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>>>
>>> HI all, a couple questions on how HNSW works:
>>>
>>> 1. What is driving the requirement for two copies of the input vectors?  It 
>>> looks like the RAVV implementations do shallow copies, so the vector from A 
>>> is the same that would be returned by B.  What am I missing?
>>>
>>> 2. What is the intended behavior when adding identical vectors to a HNSW?  
>>> It looks like when I supply 10 identical vectors, they all get added to the 
>>> graph, but when I search for the nearest neighbors, I only get one of them 
>>> in the result set.
>>>
>>> --
>>> Jonathan Ellis
>>> co-founder, http://www.datastax.com
>>> @spyced
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: HNSW questions

Reply via email to