michaeljmarshall commented on PR #16145: URL: https://github.com/apache/lucene/pull/16145#issuecomment-4596847498
I don't have benchmark numbers yet, but I can work on getting those. Is this the recommended benchmark to run to get HNSW numbers? https://github.com/mikemccand/luceneutil/blob/main/README.md#running-the-knn-benchmark As for the current state of this PR, I am concerned that the implementation has a latent bug in the state tracking logic. Because we allow cloning of the `MemorySegmentIndexInput`, we could get invalid state by: 1. Init `MemorySegmentIndexInput` with `RANDOM` read advise. 2. Clone the instance. 3. Update the read advise on the clone, which holds references to the same blocks of OS memory, to `NOT RANDOM`. This problem also exists for the `slice` method, which could clone a subset of the `segments[]`. One solution could be to store the advice per segment in a synchronized bitmap (in a way similar to the `sharedPrefetchCounter`), but that introduces more overhead to track state. My primary concern with this (latent) bug is that it is very nuanced and would like lead to unpredictable behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
