On 05/02/2017 04:34 PM, Prakash Sangappa wrote: > Similarly, a madvise() option also requires additional system call by every > process mapping the file, this is considered a overhead for the database.
How long-lived are these processes? For a database, I'd assume that this would happen a single time, or a single time per mmap() at process startup time. Such a syscall would be doing something on the order of taking mmap_sem, walking the VMA tree, setting a bit per VMA, and unlocking. That's a pretty cheap one-time cost... > If we do consider a new madvise() option, will it be acceptable > since this will be specifically for hugetlbfs file mappings? Ideally, it would be something that is *not* specifically for hugetlbfs. MADV_NOAUTOFILL, for instance, could be defined to SIGSEGV whenever memory is touched that was not populated with MADV_WILLNEED, mlock(), etc... > If so, > would a new flag to mmap() call itself be acceptable, which would > define the proposed behavior?. That way no additional system calls > need to be made. I don't feel super strongly about it, but I guess an mmap() flag could work too.