Re: [I] KnnFloatVectorQuery misses highest-ranking results that FloatVectorSimilarityQuery retrieves [lucene]

via GitHub Fri, 26 Sep 2025 04:20:41 -0700


benwtrent commented on issue #13611:
URL: https://github.com/apache/lucene/issues/13611#issuecomment-3338207599

@msokolov I am assuming david is using this: `return new
Lucene99HnswVectorsFormat(16, 250);`

> Generally speaking increasing [maxConn] should yield improved recall.

Agreed, generally, when there are tightly clustered vectors increasing
`maxConn` is the next step.

We really need to adjust our algorithm to dynamically handle very similar
vectors :(. I think we should be able to keep extra connections (instead of
diversity pruning), when we detect things are tightly clustered. I haven't been
able to dig into improving this more recently.

One other thing to think about @david-sitsky , you could do a multi-pass
algorithm.

It SEEMS like you are getting the right parent doc, but the children scores
returned aren't always the best child. You could do a secondary reranking after
gathering the vector values and rescore to apply the true best score from the
children docs. But, i am not 100% sure this would help you all cases.

You would use:
https://github.com/apache/lucene/blob/e706267b893576cd334a783e6dfa8b4008cdc7b2/lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java#L265C1-L268C4

with `FunctionScoreQuery`

I imagine for the vectors, you would need to join to the parent, then join
to the child blocks (`ToChildBlockJoinQuery`) and then join back up to the
parent again? Yeah, seems like a ton of things...

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] KnnFloatVectorQuery misses highest-ranking results that FloatVectorSimilarityQuery retrieves [lucene]

Reply via email to