[ 
https://issues.apache.org/jira/browse/LUCENE-9614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236344#comment-17236344
 ] 

Michael Sokolov commented on LUCENE-9614:
-----------------------------------------

{quote}Today the Query API assumes that you can figure out whether a document 
matches in isolation, regardless of other matches in the index/segment
{quote}
I'm not sure what you mean there - BooleanQuery for example relies on matches 
from other queries. Do you mean because of {{Weight.matches}}? Something else? 
I guess we could just return MATCH_WITH_NO_TERMS in such cases?

There is a similar API in \{{ FloatPointNearestNeighbor.nearest(IndexSearcher 
...) }} although that does not accept a {{Query}}. Maybe that can be a model.

OTOH it's easy enough to write a query that capture the closest K hits up front 
and presents them as an iterator, so if someone wants a Query for convenience 
they can do something like 
[this|https://github.com/mikemccand/luceneutil/pull/87/files?file-filters%5B%5D=.java&file-filters%5B%5D=.py&file-filters%5B%5D=.tasks#diff-a391217caf024ed0ad7cc1b95d62ce7d679e582b850dab887d6c42fe69ed5045]
 thing I posted for benchmarking purposes. I don't see the harm in offering 
such a thing?

We could equally well have a query like {{KnnVectorQuery(int target, int 
speedAccuracyTradeoff, Query filter)}}? I'm not sure what the plus/minus of the 
two approaches would be of this versus the APIs that accept (or are implemented 
by) IndexSearcher

> Implement KNN Query
> -------------------
>
>                 Key: LUCENE-9614
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9614
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Michael Sokolov
>            Priority: Major
>
> Now we have a vector index format, and one vector indexing/KNN search 
> implementation, but the interface is low-level: you can search across a 
> single segment only. We would like to expose a Query implementation. 
> Initially, we want to support a usage where the KnnVectorQuery selects the 
> k-nearest neighbors without regard to any other constraints, and these can 
> then be filtered as part of an enclosing Boolean or other query.
> Later we will want to explore some kind of filtering *while* performing 
> vector search, or a re-entrant search process that can yield further results. 
> Because of the nature of knn search (all documents having any vector value 
> match), it is more like a ranking than a filtering operation, and it doesn't 
> really make sense to provide an iterator interface that can be merged in the 
> usual way, in docid order, skipping ahead. It's not yet clear how to satisfy 
> a query that is "k nearest neighbors satsifying some arbitrary Query", at 
> least not without realizing a complete bitset for the Query. But this is for 
> a later issue; *this* issue is just about performing the knn search in 
> isolation, computing a set of (some given) K nearest neighbors, and providing 
> an iterator over those.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to