[PR] MemorySegmentIndexInput: always prefetch on RANDOM mode [lucene]

via GitHub Thu, 28 May 2026 11:45:57 -0700


michaeljmarshall opened a new pull request, #16145:
URL: https://github.com/apache/lucene/pull/16145


   ### Description
   
   The power-of-two back off in MemorySegmentIndexInput.prefetch() assumes 
consecutive page-cache hits suggest the file is warming up, so madvise overhead 
can be skipped until the next power-of-two call. That assumption breaks for 
ReadAdvice.RANDOM, where each request accesses unpredictable pages — a warm 
page in the past says nothing about whether the next page is warm (e.g. HNSW 
graph traversal touches a different path per query).
   
   Add a `volatile boolean isRandom` field, set by updateReadAdvice() and also 
by slice(String, long, long, IOContext), the primary entry point for HNSW 
vector files, and skip the backoff in prefetch() when it is true and also skip 
resetting the `sharedPrefetchCounter`.
   
   The one drawback: for RANDOM-advised files that are fully warm in the page 
cache, isLoaded() is now called on every prefetch rather than being throttled. 
In practice this seems acceptable because isLoaded() on warm pages should be 
cheap, and a fully warm RANDOM file means prefetch will now return a more 
accurate answer than before.
   
   On x86 (TSO) a volatile load before LOCK XADD requires no hardware fence and 
costs nothing measurable. On ARM the ldar before ldadd adds a small ordering 
cost, still dwarfed by the atomic itself. The miss-reset lambda returns to a 
plain set(0) with no special-casing.
   
   I used claude to generate the tests and then I manually reviewed them.
   
   I also considered encoding RANDOM mode as a negative sentinel in the counter 
itself (Integer.MIN_VALUE) to avoid any new field. Rejected: the counter walks 
toward zero on every getAndIncrement(), losing RANDOM mode after ~2B calls 
without a miss; preserving it through misses required a CAS loop (getAndUpdate) 
in the miss handler, adding complexity that doesn't seem necessary.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] MemorySegmentIndexInput: always prefetch on RANDOM mode [lucene]

Reply via email to