mikemccand commented on PR #16145:
URL: https://github.com/apache/lucene/pull/16145#issuecomment-4593386777

   I like the overall idea here, to turn off the exponential backoff behavior 
for the `MADV_WILLNEED` call when we `MADV_RANDOM`'d this slice.  But we need 
to be a bit careful -- `isLoaded()` is a surprisingly costly API, at least on 
Linux.  It `malloc()`s a `byte[]` (one byte per page), then calls `mincore`, 
which is a walk through kernel's virtual address mapping / tree (likely costly, 
not sure).  Then a for loop checking if there are any `0` in the returned 
`byte[]`, then free it and return.
   
   > The one drawback: for RANDOM-advised files that are fully warm in the page 
cache, isLoaded() is now called on every prefetch rather than being throttled. 
In practice this seems acceptable because isLoaded() on warm pages should be 
cheap, and a fully warm RANDOM file means prefetch will now return a more 
accurate answer than before.
   
   Strangely, it's the opposite: warm / hot pages are more costly than the cold 
case, because it cannot early-break that for loop when all bytes are `1`.
   
   I think we can turn off all the `isLoaded()` checking entirely when 
`MADV_RANDOM`?  (I think that's also @jimczi comment above?).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to