westonpace commented on pull request #11588:
URL: https://github.com/apache/arrow/pull/11588#issuecomment-959954827


   > The whole point of our madvise usage is to let the kernel prefetch data. 
If we decide we don't need prefetching, then we should just remove calls to 
madvise (which is a system call, so is not without overhead).
   
   The problem seems to be the default prefetch.  _The fadvise documentation 
here is actually more complete than the madvise documentation but I don't know 
for certain that the parameters mean the same thing:_
   
   > Under Linux, POSIX_FADV_NORMAL sets the readahead window to the default 
size for the backing device; POSIX_FADV_SEQUENTIAL doubles this size, and 
POSIX_FADV_RANDOM disables file readahead entirely. These changes affect the 
entire file, not just the specified region (but other open file handles to the 
same file are unaffected). 
   
   For Linux the parameter of interest is 
[read_ahead_kb](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-storage_and_file_systems-configuration_tools)
 which is the default readahead referred to in the advise documentation.  On my 
system it is set to 128 (and I think this is the default).
   
   While WILLNEED and DONTNEED are specific prefetching instructions (and only 
apply to a range of data) the advice RANDOM, SEQUENTIAL, and NORMAL are 
file-wide instructions to control the default behavior.
   
   Since we only need to call it once per file the overhead shouldn't be too 
great.
   
   > That sounds contradictory, so I'm not sure how to resolve it.
   
   Another way to phrase it would be "can we modify the file-wide default and 
still provide instructions for specific regions."
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to