Yes, it's up to the application. And it is definitely a pathological
case when it happens; https://github.com/apache/lucene/issues/11626

On Tue, May 9, 2023 at 1:30 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>
> I don't see anything to make sure vectors are unique in IndexingChain down to 
> FieldWriter, is that handled somewhere else?  Or is it just up to the user to 
> make sure no documents end up with duplicate vectors?
>
> On Wed, Apr 19, 2023 at 5:07 AM Michael Sokolov <msoko...@gmail.com> wrote:
>>
>> Oh identical vectors. Basically unsupported. If you create a large index 
>> filled with identical vectors it leads to pathological behavior. Seems to be 
>> a weakness in the algorithm. If you have any idea how to improve that, it 
>> would be welcome. But in real world scenarios, it doesn't seem to arise?
>>
>> On Tue, Apr 18, 2023, 10:55 PM Jonathan Ellis <jbel...@gmail.com> wrote:
>>>
>>> HI all, a couple questions on how HNSW works:
>>>
>>> 1. What is driving the requirement for two copies of the input vectors?  It 
>>> looks like the RAVV implementations do shallow copies, so the vector from A 
>>> is the same that would be returned by B.  What am I missing?
>>>
>>> 2. What is the intended behavior when adding identical vectors to a HNSW?  
>>> It looks like when I supply 10 identical vectors, they all get added to the 
>>> graph, but when I search for the nearest neighbors, I only get one of them 
>>> in the result set.
>>>
>>> --
>>> Jonathan Ellis
>>> co-founder, http://www.datastax.com
>>> @spyced
>
>
>
> --
> Jonathan Ellis
> co-founder, http://www.datastax.com
> @spyced

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to