[ https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236344#comment-17236344 ]
Michael Sokolov commented on LUCENE-9614: ----------------------------------------- {quote}Today the Query API assumes that you can figure out whether a document matches in isolation, regardless of other matches in the index/segment {quote} I'm not sure what you mean there - BooleanQuery for example relies on matches from other queries. Do you mean because of {{Weight.matches}}? Something else? I guess we could just return MATCH_WITH_NO_TERMS in such cases? There is a similar API in \{{ FloatPointNearestNeighbor.nearest(IndexSearcher ...) }} although that does not accept a {{Query}}. Maybe that can be a model. OTOH it's easy enough to write a query that capture the closest K hits up front and presents them as an iterator, so if someone wants a Query for convenience they can do something like [this|https://github.com/mikemccand/luceneutil/pull/87/files?file-filters%5B%5D=.java&file-filters%5B%5D=.py&file-filters%5B%5D=.tasks#diff-a391217caf024ed0ad7cc1b95d62ce7d679e582b850dab887d6c42fe69ed5045] thing I posted for benchmarking purposes. I don't see the harm in offering such a thing? We could equally well have a query like {{KnnVectorQuery(int target, int speedAccuracyTradeoff, Query filter)}}? I'm not sure what the plus/minus of the two approaches would be of this versus the APIs that accept (or are implemented by) IndexSearcher > Implement KNN Query > ------------------- > > Key: LUCENE-9614 > URL: https://issues.apache.org/jira/browse/LUCENE-9614 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Michael Sokolov > Priority: Major > > Now we have a vector index format, and one vector indexing/KNN search > implementation, but the interface is low-level: you can search across a > single segment only. We would like to expose a Query implementation. > Initially, we want to support a usage where the KnnVectorQuery selects the > k-nearest neighbors without regard to any other constraints, and these can > then be filtered as part of an enclosing Boolean or other query. > Later we will want to explore some kind of filtering *while* performing > vector search, or a re-entrant search process that can yield further results. > Because of the nature of knn search (all documents having any vector value > match), it is more like a ranking than a filtering operation, and it doesn't > really make sense to provide an iterator interface that can be merged in the > usual way, in docid order, skipping ahead. It's not yet clear how to satisfy > a query that is "k nearest neighbors satsifying some arbitrary Query", at > least not without realizing a complete bitset for the Query. But this is for > a later issue; *this* issue is just about performing the knn search in > isolation, computing a set of (some given) K nearest neighbors, and providing > an iterator over those. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org