kaivalnp commented on PR #12590: URL: https://github.com/apache/lucene/pull/12590#issuecomment-1738933241
> My main concern is that we are adding yet another extension point for "partial control" when we already have that with the rewrite or something even more complex with the collector While we technically can do so, copy-pasting [`#rewrite`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L64-L93) seems very repetitive to me because we would also have to copy-paste [`#sequentialSearch`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L95-L102), [`#parallelSearch`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L104-L112), [`#searchLeaf`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L114-L122), [`#getLeafResults`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L124-L154), [`#createBitSet`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L156-L172), [` #createRewrittenQuery`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L219-L234) as well as [`DocAndScoreQuery`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L297-L452) (almost the entire file, all of which are `private`) to do so.. Doing this via the collector has an overhead of finding out the index-level `topK` from multiple segment-level results *twice* (which is done [here](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L88) anyways) > It would be very easy to do the wrong thing by allowing sub-classes to override this method for yet another avenue of customization We also seem to have [`#exactSearch`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L180-L181) as `protected` only for checking its execution from tests (that could have the same issue).. We could maybe add some javadocs to [`#createRewrittenQuery`](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L219-L234), so that users know when to override it / do so more carefully? > It seems to me we have enough extension points no? I feel that giving access to the final `topK` results to implementers is a good extension point to have, allowing them to post-process the final results or even rewrite into some custom `Query` *if needed* (in most cases, just delegate to the parent function) Also open to other suggestions (like maybe a specific `postProcess` function that receives the `topK` and returns the results after reading / modifying, so that the API is more specific) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
