Re: HNSW search with threshold

Michael Sokolov Fri, 11 Nov 2022 06:57:06 -0800

I think it's fine to warn about this, but in general large values of K
will increase cost, with or without thresholding, so this is not a new
thing to warn about


On Thu, Nov 10, 2022 at 5:50 AM Adrien Grand <jpou...@gmail.com> wrote:
>
> That would work for me, though this is something that I would like to be 
> documented as not recommended.
>
> On Thu, Nov 10, 2022 at 2:33 PM Alexey Gorlenko <agorlen...@gmail.com> wrote:
>>
>> I think we can support both parameters: k and threshold. And if we need to 
>> get all docs by the threshold, we just will set k == Integer.MAX_VALUE.
>>
>> чт, 10 нояб. 2022 г. в 12:43, Adrien Grand <jpou...@gmail.com>:
>>>
>>> I wonder if it would actually be a good idea to support filtering _only_ 
>>> based on distance. In the worst case scenario, this may require traversing 
>>> the whole HNSW graph and would run in linear time with the number of 
>>> vectors, with a high constant factor since we'd need to compute a distance 
>>> for every vector? I imagine that this would only make sense for low values 
>>> of the radius, so that few vectors would match, but this looks to me like 
>>> it would be hard to predict whether a given radius would actually match a 
>>> small set of vectors. Should the query still require a `k` value in 
>>> addition to the radius to make sure it doesn't go wild?
>>>
>>> On Tue, Nov 8, 2022 at 7:26 AM Alexey Gorlenko <agorlen...@gmail.com> wrote:
>>>>
>>>> Thanks, Michael!
>>>> Yes, I will try.
>>>>
>>>> вт, 8 нояб. 2022 г. в 03:31, Michael Sokolov <msoko...@gmail.com>:
>>>>>
>>>>> +1 to adding a scoring threshold. I think it could be another
>>>>> parameter to KnnVectorQuery. Do you want to have a try at adding this?
>>>>> If so, please feel free to open a PR and I will be happy to guide you.
>>>>>
>>>>> On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko <agorlen...@gmail.com> 
>>>>> wrote:
>>>>> >
>>>>> > Hi!
>>>>> >
>>>>> > There are some use cases where we need to find vectors with the 
>>>>> > distance (by some metric) to the given vector V less than the given 
>>>>> > threshold T. That task is very similar to the knn problem, but in this 
>>>>> > case we don't have a quantity of the nearest neighbours k.
>>>>> >
>>>>> > As I see, the current implementation of knn doesn't provide such 
>>>>> > functionality. But at the first glance it is not very difficult to 
>>>>> > modify the method search of HnswGraph to implement that feature (do not 
>>>>> > limit result size and get rid of candidates which exceed threshold).
>>>>> >
>>>>> > But maybe that idea has some not obvious problems which I haven't 
>>>>> > noticed, and in reality an implementation of that idea would have 
>>>>> > fundamental difficulties?
>>>>> >
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>>>>
>>>
>>>
>>> --
>>> Adrien
>
>
>
> --
> Adrien

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: HNSW search with threshold

Reply via email to