This fixes a critical mmap_sem lock leak that leads to deadlock when a
page fault with FAULT_FLAG_RETRY_NOWAIT encounters a fatal signal.
The Bug Scenario:
=================
When get_user_pages() is called with FOLL_NOWAIT flag (e.g., from
direct I/O or certain network operations), it sets both
FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT.
__do_page_fault()
handle_mm_fault()
__handle_mm_fault()
handle_pte_fault()
do_swap_page()
lock_page_or_retry()
__lock_page_or_retry()
The page fault path calls __lock_page_or_retry() which has special
handling for FAULT_FLAG_RETRY_NOWAIT (mm/filemap.c:1025-1026):
if (flags & FAULT_FLAG_RETRY_NOWAIT)
return 0; /* Lock NOT released! */
This returns 0 (page not locked) but mmap_sem is STILL HELD, as
documented by the explicit "CAUTION" comment at line 1022.
The fault handler then sets VM_FAULT_RETRY and returns. When
__do_page_fault() sees VM_FAULT_RETRY, there are 3 cases possible:
0. FAULT_FLAG_ALLOW_RETRY flag set, no fatal signal => goto retry and
try to acquire mmap_sem => self deadlock in case someone tries to
grab the mmap_sem for write
1/2. FAULT_FLAG_ALLOW_RETRY flag set with a fatal signal pending, it
incorrectly assumes the lock was already released (line 1318
comment: "the mmap_sem has already been released") and returns to
userspace or calls no_context() without releasing the lock.
Result: Process continues with mmap_sem held, leading to deadlock
on the next page fault or mmap_sem operation.
The Correct Pattern:
====================
The proper handling is documented in fs/userfaultfd.c:331-333:
"If FAULT_FLAG_ALLOW_RETRY is set, the mmap_sem must be released
before returning VM_FAULT_RETRY only if FAULT_FLAG_RETRY_NOWAIT
is not set."
And implemented at lines 444-483:
ret = VM_FAULT_RETRY;
if (vmf->flags & FAULT_FLAG_RETRY_NOWAIT)
goto out; /* Return WITHOUT releasing lock */
/* ... */
up_read(&mm->mmap_sem); /* Release only in non-NOWAIT case */
The Fix:
========
Before going to retry, returning to userspace or calling no_context(),
explicitly check if FAULT_FLAG_RETRY_NOWAIT was set. If yes, release
mmap_sem that was left held by __lock_page_or_retry().
The release is needed in all 3 paths:
- retry if no fatal signals: Re-acquire mmap_sem right after goto
- Userspace: Cannot return with kernel lock held
- no_context(): May return via fixup_exception() for kernel faults
(e.g., copy_from_user), and cannot continue with lock held
This matches the semantic contract documented in userfaultfd.c.
Note:
=====
If we went to the retry, cleared FAULT_FLAG_ALLOW_RETRY but still have
FAULT_FLAG_RETRY_NOWAIT, VM_FAULT_RETRY could be returned by
__lock_page_or_retry again, but this time mmap_sem is always dropped.
And userfaultfd even BUGs in case
!FAULT_FLAG_ALLOW_RETRY && FAULT_FLAG_RETRY_NOWAIT.
Observed Symptoms:
==================
Process deadlocks with stack trace showing:
- Owns mmap_sem
- Page fault from userspace (no syscall context)
- Blocked in rwsem_down_read_failed trying to re-acquire mmap_sem
https://virtuozzo.atlassian.net/browse/PSBM-161478
Signed-off-by: Konstantin Khorenko <[email protected]>
---
arch/x86/mm/fault.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index ebbedfd975730..565b11c6b3e0b 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1322,6 +1322,18 @@ __do_page_fault(struct pt_regs *regs, unsigned long
error_code,
if (unlikely(fault & VM_FAULT_RETRY)) {
/* Retry at most once */
if (flags & FAULT_FLAG_ALLOW_RETRY) {
+ /*
+ * FAULT_FLAG_RETRY_NOWAIT means the lock was NOT
+ * released by __lock_page_or_retry().
+ * We must release it here before in all 3 cases:
+ * 0. before going to retry if no fatal signal -
+ * it acquires the mmap_sem
+ * 1. before returning to userspace
+ * 2. before calling no_context()
+ */
+ if (flags & FAULT_FLAG_RETRY_NOWAIT)
+ up_read(&mm->mmap_sem);
+
flags &= ~FAULT_FLAG_ALLOW_RETRY;
flags |= FAULT_FLAG_TRIED;
if (!fatal_signal_pending(tsk))
--
2.43.0
_______________________________________________
Devel mailing list
[email protected]
https://lists.openvz.org/mailman/listinfo/devel