On 28-Apr-18 10:38 AM, Andrew Rybchenko wrote:
On 04/25/2018 01:36 PM, Anatoly Burakov wrote:
The original implementation used flock() locks, but was later
switched to using fcntl() locks for page locking, because
fcntl() locks allow locking parts of a file, which is useful
for single-file segments mode, where locking the entire file
isn't as useful because we still need to grow and shrink it.

However, according to fcntl()'s Ubuntu manpage [1], semantics of
fcntl() locks have a giant oversight:

   This interface follows the completely stupid semantics of System
   V and IEEE Std 1003.1-1988 (“POSIX.1”) that require that all
   locks associated with a file for a given process are removed
   when any file descriptor for that file is closed by that process.
   This semantic means that applications must be aware of any files
   that a subroutine library may access.

Basically, closing *any* fd with an fcntl() lock (which we do because
we don't want to leak fd's) will drop the lock completely.

So, in this commit, we will be reverting back to using flock() locks
everywhere. However, that still leaves the problem of locking parts
of a memseg list file in single file segments mode, and we will be
solving it with creating separate lock files per each page, and
tracking those with flock().

We will also be removing all of this tailq business and replacing it
with a simple array - saving a few bytes is not worth the extra
hassle of dealing with pointers and potential memory allocation
failures. Also, remove the tailq lock since it is not needed - these
fd lists are per-process, and within a given process, it is always
only one thread handling access to hugetlbfs.

So, first one to allocate a segment will create a lockfile, and put
a shared lock on it. When we're shrinking the page file, we will be
trying to take out a write lock on that lockfile, which would fail if
any other process is holding onto the lockfile as well. This way, we
can know if we can shrink the segment file. Also, if no other locks
are found in the lock list for a given memseg list, the memseg list
fd is automatically closed.

One other thing to note is, according to flock() Ubuntu manpage [2],
upgrading the lock from shared to exclusive is implemented by dropping
and reacquiring the lock, which is not atomic and thus would have
created race conditions. So, on attempting to perform operations in
hugetlbfs, we will take out a writelock on hugetlbfs directory, so
that only one process could perform hugetlbfs operations concurrently.

[1] http://manpages.ubuntu.com/manpages/artful/en/man2/fcntl.2freebsd.html
[2] http://manpages.ubuntu.com/manpages/bionic/en/man2/flock.2.html

Fixes: 66cc45e293ed ("mem: replace memseg with memseg lists")
Fixes: 582bed1e1d1d ("mem: support mapping hugepages at runtime")
Fixes: a5ff05d60fc5 ("mem: support unmapping pages at runtime")
Fixes: 2a04139f66b4 ("eal: add single file segments option")
Cc: anatoly.bura...@intel.com

Signed-off-by: Anatoly Burakov <anatoly.bura...@intel.com>
Acked-by: Bruce Richardson <bruce.richard...@intel.com>

We have a problem with the changeset if EAL option -m or --socket-mem is used.
EAL initialization hangs just after EAL: Probing VFIO support...
strace points to flock(7, LOCK_EX
List of file descriptors:
# ls /proc/25452/fd -l
total 0
lrwx------ 1 root root 64 Apr 28 10:34 0 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 28 10:34 1 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 28 10:32 2 -> /dev/pts/0
lrwx------ 1 root root 64 Apr 28 10:34 3 -> /run/.rte_config
lrwx------ 1 root root 64 Apr 28 10:34 4 -> socket:[154166]
lrwx------ 1 root root 64 Apr 28 10:34 5 -> socket:[154158]
lr-x------ 1 root root 64 Apr 28 10:34 6 -> /dev/hugepages
lr-x------ 1 root root 64 Apr 28 10:34 7 -> /dev/hugepages

I guess the problem is that there are two /dev/hugepages and
it hangs on the second.

Ideas how to solve it?

Andrew.


Hi Andrew,

Please try the following patch:

http://dpdk.org/dev/patchwork/patch/39166/

This should fix the issue.

--
Thanks,
Anatoly

Reply via email to