kaivalnp commented on issue #15379: URL: https://github.com/apache/lucene/issues/15379#issuecomment-3733582514
Caching results for `Knn[Byte|Float]VectorQuery` can be tricky -- because their contract is to find the K highest scoring hits at the index-level -- but results are cached at the segment-level. With document deletes / updates, the segment-level results of a query can change -- for example if a document is deleted, then all cached segment-level results that contain the document are invalidated, because we need the next highest scoring doc for the query now. Even without deletes, if a new segment indexes a vector that is closer to a query, then the segment-level result containing the lowest-scoring document is no longer valid. At the very least, we'll need changes to the way cached results are used, allowing for re-computation if some segment-level result is invalidated? The problem with KNN queries is that each document cannot be determined as a hit _independently_ of other documents in the index. > tweaking it for knn queries helps in caching those? I don't think this will work, because KNN queries are marked as "not cacheable" because of the above reasons -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
