mayya-sharipova commented on PR #992:
URL: https://github.com/apache/lucene/pull/992#issuecomment-1191493127

   @jtibshirani Thanks for the review. 
   
   > It's a bit confusing that the baseline slows down so much from 533s to 
654s, which is almost 2 minutes slower. Do you have a sense for why this is? I 
wonder if graph building time can vary a lot based on what order the vectors 
are processed.
   
   I did not do the detailed analysis and can only speculate that this could be 
the reason, but also that `SortingVectorValues`  can contribute to slowdown as 
they need to do extra lookups. 
   
   > I just realized that we're doing a cast which is pretty tricky/ fragile. 
The check visited.length() < capacity is only true if we are building the graph 
(not searching), and HnswGraphBuilder happens to always use FixedBitSet.
   As a follow-up maybe we should consider 
[LUCENE-10404](https://issues.apache.org/jira/browse/LUCENE-10404) or something 
similar, which chooses a better 'visited' data structure and doesn't require us 
to do this cast + resize.
   
   Good point, I agree about the fragile solution and +1 for investigate better 
data structure for `visited`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to