I wonder if it would actually be a good idea to support filtering _only_
based on distance. In the worst case scenario, this may require traversing
the whole HNSW graph and would run in linear time with the number of
vectors, with a high constant factor since we'd need to compute a distance
for every vector? I imagine that this would only make sense for low values
of the radius, so that few vectors would match, but this looks to me like
it would be hard to predict whether a given radius would actually match a
small set of vectors. Should the query still require a `k` value in
addition to the radius to make sure it doesn't go wild?

On Tue, Nov 8, 2022 at 7:26 AM Alexey Gorlenko <agorlen...@gmail.com> wrote:

> Thanks, Michael!
> Yes, I will try.
>
> вт, 8 нояб. 2022 г. в 03:31, Michael Sokolov <msoko...@gmail.com>:
>
>> +1 to adding a scoring threshold. I think it could be another
>> parameter to KnnVectorQuery. Do you want to have a try at adding this?
>> If so, please feel free to open a PR and I will be happy to guide you.
>>
>> On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko <agorlen...@gmail.com>
>> wrote:
>> >
>> > Hi!
>> >
>> > There are some use cases where we need to find vectors with the
>> distance (by some metric) to the given vector V less than the given
>> threshold T. That task is very similar to the knn problem, but in this case
>> we don't have a quantity of the nearest neighbours k.
>> >
>> > As I see, the current implementation of knn doesn't provide such
>> functionality. But at the first glance it is not very difficult to modify
>> the method search of HnswGraph to implement that feature (do not limit
>> result size and get rid of candidates which exceed threshold).
>> >
>> > But maybe that idea has some not obvious problems which I haven't
>> noticed, and in reality an implementation of that idea would have
>> fundamental difficulties?
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>

-- 
Adrien

Reply via email to