benwtrent commented on issue #13611:
URL: https://github.com/apache/lucene/issues/13611#issuecomment-3338207599

   @msokolov I am assuming david is using this: `return new 
Lucene99HnswVectorsFormat(16, 250);`
   
   > Generally speaking increasing [maxConn] should yield improved recall.
   
   Agreed, generally, when there are tightly clustered vectors increasing 
`maxConn` is the next step. 
   
   
   We really need to adjust our algorithm to dynamically handle very similar 
vectors :(. I think we should be able to keep extra connections (instead of 
diversity pruning), when we detect things are tightly clustered. I haven't been 
able to dig into improving this more recently.
   
   
   
   One other thing to think about @david-sitsky , you could do a multi-pass 
algorithm. 
   
   It SEEMS like you are getting the right parent doc, but the children scores 
returned aren't always the best child. You could do a secondary reranking after 
gathering the vector values and rescore to apply the true best score from the 
children docs. But, i am not 100% sure this would help you all cases.
   
   You would use: 
https://github.com/apache/lucene/blob/e706267b893576cd334a783e6dfa8b4008cdc7b2/lucene/core/src/java/org/apache/lucene/search/DoubleValuesSource.java#L265C1-L268C4
   
   with `FunctionScoreQuery`
   
   I imagine for the vectors, you would need to join to the parent, then join 
to the child blocks (`ToChildBlockJoinQuery`) and then join back up to the 
parent again? Yeah, seems like a ton of things...
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to