That would work for me, though this is something that I would like to be documented as not recommended.
On Thu, Nov 10, 2022 at 2:33 PM Alexey Gorlenko <agorlen...@gmail.com> wrote: > I think we can support both parameters: k and threshold. And if we need to > get all docs by the threshold, we just will set k == Integer.MAX_VALUE. > > чт, 10 нояб. 2022 г. в 12:43, Adrien Grand <jpou...@gmail.com>: > >> I wonder if it would actually be a good idea to support filtering _only_ >> based on distance. In the worst case scenario, this may require traversing >> the whole HNSW graph and would run in linear time with the number of >> vectors, with a high constant factor since we'd need to compute a distance >> for every vector? I imagine that this would only make sense for low values >> of the radius, so that few vectors would match, but this looks to me like >> it would be hard to predict whether a given radius would actually match a >> small set of vectors. Should the query still require a `k` value in >> addition to the radius to make sure it doesn't go wild? >> >> On Tue, Nov 8, 2022 at 7:26 AM Alexey Gorlenko <agorlen...@gmail.com> >> wrote: >> >>> Thanks, Michael! >>> Yes, I will try. >>> >>> вт, 8 нояб. 2022 г. в 03:31, Michael Sokolov <msoko...@gmail.com>: >>> >>>> +1 to adding a scoring threshold. I think it could be another >>>> parameter to KnnVectorQuery. Do you want to have a try at adding this? >>>> If so, please feel free to open a PR and I will be happy to guide you. >>>> >>>> On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko <agorlen...@gmail.com> >>>> wrote: >>>> > >>>> > Hi! >>>> > >>>> > There are some use cases where we need to find vectors with the >>>> distance (by some metric) to the given vector V less than the given >>>> threshold T. That task is very similar to the knn problem, but in this case >>>> we don't have a quantity of the nearest neighbours k. >>>> > >>>> > As I see, the current implementation of knn doesn't provide such >>>> functionality. But at the first glance it is not very difficult to modify >>>> the method search of HnswGraph to implement that feature (do not limit >>>> result size and get rid of candidates which exceed threshold). >>>> > >>>> > But maybe that idea has some not obvious problems which I haven't >>>> noticed, and in reality an implementation of that idea would have >>>> fundamental difficulties? >>>> > >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>> >>>> >> >> -- >> Adrien >> > -- Adrien