benwtrent commented on issue #13611: URL: https://github.com/apache/lucene/issues/13611#issuecomment-3338207599
@msokolov I am assuming david is using this: `return new Lucene99HnswVectorsFormat(16, 250);` > Generally speaking increasing [maxConn] should yield improved recall. Agreed, generally, when there are tightly clustered vectors increasing `maxConn` is the next step. We really need to adjust our algorithm to dynamically handle very similar vectors :(. I think we should be able to keep extra connections (instead of diversity pruning), when we detect things are tightly clustered. I haven't been able to dig into improving this more recently. One other thing to think about @david-sitsky , you could do a multi-pass algorithm. It SEEMS like you are getting the right parent doc, but the children scores returned aren't always the best child. You could do a secondary reranking after gathering the vector values and rescore to apply the true best score from the children docs. But, i am not 100% sure this would help you all cases. You would use: https://github.com/apache/lucene/blob/e706267b893576cd334a783e6dfa8b4008cdc7b2/lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java#L265C1-L268C4 with `FunctionScoreQuery` I imagine for the vectors, you would need to join to the parent, then join to the child blocks (`ToChildBlockJoinQuery`) and then join back up to the parent again? Yeah, seems like a ton of things... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
