Re: [PR] MemorySegmentIndexInput: always prefetch on RANDOM mode [lucene]

via GitHub Mon, 01 Jun 2026 15:00:28 -0700


michaeljmarshall commented on PR #16145:
URL: https://github.com/apache/lucene/pull/16145#issuecomment-4596847498


   I don't have benchmark numbers yet, but I can work on getting those. Is this 
the recommended benchmark to run to get HNSW numbers? 
https://github.com/mikemccand/luceneutil/blob/main/README.md#running-the-knn-benchmark
   
   As for the current state of this PR, I am concerned that the implementation 
has a latent bug in the state tracking logic. Because we allow cloning of the 
`MemorySegmentIndexInput`, we could get invalid state by:
   
   1. Init `MemorySegmentIndexInput` with `RANDOM` read advise.
   2. Clone the instance.
   3. Update the read advise on the clone, which holds references to the same 
blocks of OS memory, to `NOT RANDOM`.
   
   This problem also exists for the `slice` method, which could clone a subset 
of the `segments[]`.
   
   One solution could be to store the advice per segment in a synchronized 
bitmap (in a way similar to the `sharedPrefetchCounter`), but that introduces 
more overhead to track state.
   
   My primary concern with this (latent) bug is that it is very nuanced and 
would like lead to unpredictable behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] MemorySegmentIndexInput: always prefetch on RANDOM mode [lucene]

Reply via email to