michaeljmarshall opened a new pull request, #16145: URL: https://github.com/apache/lucene/pull/16145
### Description The power-of-two back off in MemorySegmentIndexInput.prefetch() assumes consecutive page-cache hits suggest the file is warming up, so madvise overhead can be skipped until the next power-of-two call. That assumption breaks for ReadAdvice.RANDOM, where each request accesses unpredictable pages — a warm page in the past says nothing about whether the next page is warm (e.g. HNSW graph traversal touches a different path per query). Add a `volatile boolean isRandom` field, set by updateReadAdvice() and also by slice(String, long, long, IOContext), the primary entry point for HNSW vector files, and skip the backoff in prefetch() when it is true and also skip resetting the `sharedPrefetchCounter`. The one drawback: for RANDOM-advised files that are fully warm in the page cache, isLoaded() is now called on every prefetch rather than being throttled. In practice this seems acceptable because isLoaded() on warm pages should be cheap, and a fully warm RANDOM file means prefetch will now return a more accurate answer than before. On x86 (TSO) a volatile load before LOCK XADD requires no hardware fence and costs nothing measurable. On ARM the ldar before ldadd adds a small ordering cost, still dwarfed by the atomic itself. The miss-reset lambda returns to a plain set(0) with no special-casing. I used claude to generate the tests and then I manually reviewed them. I also considered encoding RANDOM mode as a negative sentinel in the counter itself (Integer.MIN_VALUE) to avoid any new field. Rejected: the counter walks toward zero on every getAndIncrement(), losing RANDOM mode after ~2B calls without a miss; preserving it through misses required a CAS loop (getAndUpdate) in the miss handler, adding complexity that doesn't seem necessary. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
