Pulkitg64 commented on code in PR #15784:
URL: https://github.com/apache/lucene/pull/15784#discussion_r2880483219
##########
lucene/core/src/java/org/apache/lucene/search/VectorSimilarityCollector.java:
##########
@@ -25,44 +25,39 @@
* @lucene.experimental
*/
class VectorSimilarityCollector extends AbstractKnnCollector {
- private final float traversalSimilarity, resultSimilarity;
- private float maxSimilarity;
+ private final float resultSimilarity;
private final List<ScoreDoc> scoreDocList;
+ private float minCompetitiveSimilarity;
/**
- * Perform a similarity-based graph search. The graph is traversed till
better scoring nodes are
- * available, or the best candidate is below {@link #traversalSimilarity}.
All traversed nodes
- * above {@link #resultSimilarity} are collected.
+ * Perform a similarity-based graph search. The buffer for graph traversal
is adaptive: starts
+ * with a high value, and exponentially decays towards scores of nodes
traversed, but not
+ * collected during graph search.
*
- * @param traversalSimilarity (lower) similarity score for graph traversal.
- * @param resultSimilarity (higher) similarity score for result collection.
+ * @param resultSimilarity similarity score for result collection.
* @param visitLimit limit on number of nodes to visit.
*/
- public VectorSimilarityCollector(
- float traversalSimilarity, float resultSimilarity, long visitLimit) {
+ public VectorSimilarityCollector(float resultSimilarity, long visitLimit) {
// TODO: add search strategy support
super(1, visitLimit, AbstractVectorSimilarityQuery.DEFAULT_STRATEGY);
- if (traversalSimilarity > resultSimilarity) {
- throw new IllegalArgumentException("traversalSimilarity should be <=
resultSimilarity");
- }
- this.traversalSimilarity = traversalSimilarity;
this.resultSimilarity = resultSimilarity;
- this.maxSimilarity = Float.NEGATIVE_INFINITY;
this.scoreDocList = new ArrayList<>();
+ this.minCompetitiveSimilarity = -Float.MAX_VALUE;
}
@Override
public boolean collect(int docId, float similarity) {
- maxSimilarity = Math.max(maxSimilarity, similarity);
if (similarity >= resultSimilarity) {
scoreDocList.add(new ScoreDoc(docId, similarity));
+ return false; // do not update minCompetitiveSimilarity
Review Comment:
Alright, in that case I think we should add java docs explaining this
reasoning. In generic case, we may assume that returning false means we didn't
collect the doc but we did and it could make debugging less obvious?
`collect` function in KnnCollector
[here](https://github.com/apache/lucene/blob/02014a6245d329ae93898f281b52eb3203d3b343/lucene/core/src/java/org/apache/lucene/search/KnnCollector.java#L68)
is defined in this way only i.e return true when vector is collected. Since it
is a public function, users can use the API directly, and this would mess up
the user expectation i.e we collected doc but user would assume we didn't.
I wonder if there is better way to do it but atleast we should explain this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]