westonpace commented on pull request #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-959954827
> The whole point of our madvise usage is to let the kernel prefetch data. If we decide we don't need prefetching, then we should just remove calls to madvise (which is a system call, so is not without overhead). The problem seems to be the default prefetch. _The fadvise documentation here is actually more complete than the madvise documentation but I don't know for certain that the parameters mean the same thing:_ > Under Linux, POSIX_FADV_NORMAL sets the readahead window to the default size for the backing device; POSIX_FADV_SEQUENTIAL doubles this size, and POSIX_FADV_RANDOM disables file readahead entirely. These changes affect the entire file, not just the specified region (but other open file handles to the same file are unaffected). For Linux the parameter of interest is [read_ahead_kb](https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/performance_tuning_guide/sect-red_hat_enterprise_linux-performance_tuning_guide-storage_and_file_systems-configuration_tools) which is the default readahead referred to in the advise documentation. On my system it is set to 128 (and I think this is the default). While WILLNEED and DONTNEED are specific prefetching instructions (and only apply to a range of data) the advice RANDOM, SEQUENTIAL, and NORMAL are file-wide instructions to control the default behavior. Since we only need to call it once per file the overhead shouldn't be too great. > That sounds contradictory, so I'm not sure how to resolve it. Another way to phrase it would be "can we modify the file-wide default and still provide instructions for specific regions." -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
