mayya-sharipova commented on PR #992: URL: https://github.com/apache/lucene/pull/992#issuecomment-1191493127
@jtibshirani Thanks for the review. > It's a bit confusing that the baseline slows down so much from 533s to 654s, which is almost 2 minutes slower. Do you have a sense for why this is? I wonder if graph building time can vary a lot based on what order the vectors are processed. I did not do the detailed analysis and can only speculate that this could be the reason, but also that `SortingVectorValues` can contribute to slowdown as they need to do extra lookups. > I just realized that we're doing a cast which is pretty tricky/ fragile. The check visited.length() < capacity is only true if we are building the graph (not searching), and HnswGraphBuilder happens to always use FixedBitSet. As a follow-up maybe we should consider [LUCENE-10404](https://issues.apache.org/jira/browse/LUCENE-10404) or something similar, which chooses a better 'visited' data structure and doesn't require us to do this cast + resize. Good point, I agree about the fragile solution and +1 for investigate better data structure for `visited`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org