kaivalnp commented on PR #15784: URL: https://github.com/apache/lucene/pull/15784#issuecomment-3999099527
One potential drawback: the `traversalSimilarity` parameter is deprecated here, so the user loses the ability to tune the algorithm for a particular recall or latency. Perhaps the speed of `minCompetitiveSimilarity` moving towards `resultSimilarity` can be made configurable: right now, it is set to the midpoint of current value and a low-scoring node encountered during graph search -- and this exponential decay can have a factor other than `0.5` The user can still clone the `VectorSimilarityCollector` and override the decay factor themselves (somewhat in line with the [`LAMBDA`](https://github.com/apache/lucene/blob/daf2378eaa49c0e26b51f1fbedf8343081c700bb/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L55-L57) parameter in a KNN query). Also noting that the new algorithm produced a better recall / latency curve for all datasets I tested it on: Cohere v3 shared above + a few internal ones. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
