I think it's fine to warn about this, but in general large values of K will increase cost, with or without thresholding, so this is not a new thing to warn about
On Thu, Nov 10, 2022 at 5:50 AM Adrien Grand <jpou...@gmail.com> wrote: > > That would work for me, though this is something that I would like to be > documented as not recommended. > > On Thu, Nov 10, 2022 at 2:33 PM Alexey Gorlenko <agorlen...@gmail.com> wrote: >> >> I think we can support both parameters: k and threshold. And if we need to >> get all docs by the threshold, we just will set k == Integer.MAX_VALUE. >> >> чт, 10 нояб. 2022 г. в 12:43, Adrien Grand <jpou...@gmail.com>: >>> >>> I wonder if it would actually be a good idea to support filtering _only_ >>> based on distance. In the worst case scenario, this may require traversing >>> the whole HNSW graph and would run in linear time with the number of >>> vectors, with a high constant factor since we'd need to compute a distance >>> for every vector? I imagine that this would only make sense for low values >>> of the radius, so that few vectors would match, but this looks to me like >>> it would be hard to predict whether a given radius would actually match a >>> small set of vectors. Should the query still require a `k` value in >>> addition to the radius to make sure it doesn't go wild? >>> >>> On Tue, Nov 8, 2022 at 7:26 AM Alexey Gorlenko <agorlen...@gmail.com> wrote: >>>> >>>> Thanks, Michael! >>>> Yes, I will try. >>>> >>>> вт, 8 нояб. 2022 г. в 03:31, Michael Sokolov <msoko...@gmail.com>: >>>>> >>>>> +1 to adding a scoring threshold. I think it could be another >>>>> parameter to KnnVectorQuery. Do you want to have a try at adding this? >>>>> If so, please feel free to open a PR and I will be happy to guide you. >>>>> >>>>> On Mon, Nov 7, 2022 at 6:38 AM Alexey Gorlenko <agorlen...@gmail.com> >>>>> wrote: >>>>> > >>>>> > Hi! >>>>> > >>>>> > There are some use cases where we need to find vectors with the >>>>> > distance (by some metric) to the given vector V less than the given >>>>> > threshold T. That task is very similar to the knn problem, but in this >>>>> > case we don't have a quantity of the nearest neighbours k. >>>>> > >>>>> > As I see, the current implementation of knn doesn't provide such >>>>> > functionality. But at the first glance it is not very difficult to >>>>> > modify the method search of HnswGraph to implement that feature (do not >>>>> > limit result size and get rid of candidates which exceed threshold). >>>>> > >>>>> > But maybe that idea has some not obvious problems which I haven't >>>>> > noticed, and in reality an implementation of that idea would have >>>>> > fundamental difficulties? >>>>> > >>>>> >>>>> --------------------------------------------------------------------- >>>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >>>>> For additional commands, e-mail: dev-h...@lucene.apache.org >>>>> >>> >>> >>> -- >>> Adrien > > > > -- > Adrien --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org