On Thu, Apr 19, 2018 at 01:26:29PM +0100, Anatoly Burakov wrote: > The original implementation used flock() locks, but was later > switched to using fcntl() locks for page locking, because > fcntl() locks allow locking parts of a file, which is useful > for single-file segments mode, where locking the entire file > isn't as useful because we still need to grow and shrink it. > > However, according to fcntl()'s Ubuntu manpage [1], semantics of > fcntl() locks have a giant oversight: > > This interface follows the completely stupid semantics of System > V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all > locks associated with a file for a given process are removed > when any file descriptor for that file is closed by that process. > This semantic means that applications must be aware of any files > that a subroutine library may access. > > Basically, closing *any* fd with an fcntl() lock (which we do because > we don't want to leak fd's) will drop the lock completely. > > So, in this commit, we will be reverting back to using flock() locks > everywhere. However, that still leaves the problem of locking parts > of a memseg list file in single file segments mode, and we will be > solving it with creating separate lock files per each page, and > tracking those with flock(). > > We will also be removing all of this tailq business and replacing it > with a simple array - saving a few bytes is not worth the extra > hassle of dealing with pointers and potential memory allocation > failures. Also, remove the tailq lock since it is not needed - these > fd lists are per-process, and within a given process, it is always > only one thread handling access to hugetlbfs. > > So, first one to allocate a segment will create a lockfile, and put > a shared lock on it. When we're shrinking the page file, we will be > trying to take out a write lock on that lockfile, which would fail if > any other process is holding onto the lockfile as well. This way, we > can know if we can shrink the segment file. Also, if no other locks > are found in the lock list for a given memseg list, the memseg list > fd is automatically closed. > > One other thing to note is, according to flock() Ubuntu manpage [2], > upgrading the lock from shared to exclusive is implemented by dropping > and reacquiring the lock, which is not atomic and thus would have > created race conditions. So, on attempting to perform operations in > hugetlbfs, we will take out a writelock on hugetlbfs directory, so > that only one process could perform hugetlbfs operations concurrently. > > [1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html > [2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html > > Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists") > Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime") > Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime") > Fixes: 2a04139f66b4 ("eal: add single file segments option") > Cc: anatoly.bura...@intel.com > > Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com> > ---
While the memory subsystem in DPDK has changed a lot since I last looked at it, thereby preventing me from doing an in-depth review, this change makes sense to me. So for this and any future versions: Acked-by: Bruce Richardson <bruce.richard...@intel.com>