> Date: Thu, 9 Jan 2025 15:48:29 +0100
> From: Marcus Glocker <[email protected]>
>
> Hello bugs@, Martin,
>
> Since a while I am noticing processes hanging on my Samsung Galaxy Book4
> Edge (arm64/snapdragon-x/12-cores/16gb ram) machine. Those hangs appear
> very frequent, which makes it hard to work on the machine since things
> like xterm, ssh, man, etc. just suddenly start to hang. If this happens,
> executing another process would immediatly release the hanging/waiting
> process.
>
> I've discussed this behavior today on icb, which has lead to the
> following conversation:
>
> 11:39 < mglocker> 5344 hacki -18 0 1436K 392K idle flt_pmf 0:00
> 0.00% man
> 11:41 < mglocker> uvm_wait("flt_pmfail1");
> 11:42 < mglocker> uvm_wait("flt_pmfail2");
> 11:43 < mglocker> 49811 hacki -18 0 8144K 112K sleep/0 flt_pmf
> 0:00 0.00% xterm
> 11:54 < mglocker> ok, the process hang is always at uvm/uvm_fault.c:1879 ->
> uvm_wait("flt_pmfail2")
>
> 12:17 < kettenis> so that's pmap_enter() failing
> 12:19 < kettenis> which means a pool allocation failure
> 12:20 < kettenis> what does vmstat -m say about the "pted" and "vp" pools?
> 12:28 < mglocker> Name Size Requests Fail InUse Pgreq Pgrel Npage
> Hiwat Minpg Maxpg Idle
> 12:29 < mglocker> pted 40 962117 0 42480 1582 0 1582
> 1582 1 8 0
> 12:29 < mglocker> vp 8192 47009 102 5676 7830 1100 6730
> 7830 20 8 20
> 12:30 < mglocker> vp 102 fails?
> 12:37 < mglocker> it keeps increasing on those hangs
> 12:46 < mglocker> so pmap_enter_vp() fails for
> 12:46 < mglocker> vp2 = pool_get()
> 12:46 < mglocker> and
> 12:47 < mglocker> vp3 = pool_get()
> 13:00 < mglocker> i booted again with a fresh single processor kernel. there
> no vp fails.
> 13:09 < claudio> didn't we switch the vp pool to use per-cpu caches exactly
> because of this?
> 14:02 < kettenis> I believe so
> 14:03 < kettenis> the problem is that pmap_enter(9) isn't supposed to sleep
> 14:03 < kettenis> so the pool allocations are done with PR_NOWAIT
> 14:04 < kettenis> but that means that kd_trylock gets set
> 14:04 < kettenis> which means that the allocations fail if there is
> contention on the pool lock
> 14:04 < claudio> yes, I remeber this strange behaviour.
> 14:06 < kettenis> uvm things this means we're out of physmem
> 14:06 < kettenis> so it'll sleep until something else pokes the pagedaemon
> 14:06 < kettenis> the per-cpu mitigated the issue somewhat
> 14:07 < kettenis> but didn't solve things completely
> 14:07 < kettenis> and now that mpi pushed back the locks in uvm again, the
> problem is back
> 14:09 < kettenis> so we need a real solution for this problem...
> 14:12 < kettenis> a potential solution would be to make pmap_enter(9) return
> a different error for this case
> 14:13 < kettenis> and then handle that case different in
> uvm_fault_{upper|lower}
> 14:15 < kettenis> the problem there is that pool_get() doesn't actually tell
> us why it failed
> 14:37 < kettenis> s/contention on the pool lock/contention on the kernal map/
>
> Any proposal on how we could proceed to find a solution for this issue?
The following hack fixes the issue for me. I don't think this is a
proper solution, but it may be a starting point. Or a temporary fix.
The issue really is that we can't tell whether pmap_enter(9) failed
because we're out of physical memory, or if it failed for some other
reason. In the case at hand we failt because of contention on the
kernel map lock. But we could also be failing because we have completely run
out of KVA.
We can't sleep while holding all those uvm locks. I'm not sure the
free memory check this does is right. Or whether we want such a check
at all. The vm_map_lock()/vm_map_unlock() dance is necessary to make
sure we don't spin too quickly if the kernel map lock is contended.
A better fix would perhaps be to have a new pmap function that we
could call at this spot that would sleep until the necessary resources
are available. On arm64 this would populate the page tables using
pool allocations that use PR_WAITOK, but would not actually enter a
valid mapping. I'm going to explore that idea a bit.
Index: uvm/uvm_fault.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
diff -u -p -r1.159 uvm_fault.c
--- uvm/uvm_fault.c 3 Jan 2025 15:31:48 -0000 1.159
+++ uvm/uvm_fault.c 9 Jan 2025 15:39:53 -0000
@@ -1101,6 +1101,11 @@ uvm_fault_upper(struct uvm_faultinfo *uf
* as the map may change while we're asleep.
*/
uvmfault_unlockall(ufi, amap, NULL);
+ if (uvmexp.free > uvmexp.reserve_kernel) {
+ vm_map_lock(kernel_map);
+ vm_map_unlock(kernel_map);
+ return ERESTART;
+ }
if (uvm_swapisfull()) {
/* XXX instrumentation */
return ENOMEM;
@@ -1453,6 +1458,11 @@ uvm_fault_lower(struct uvm_faultinfo *uf
atomic_clearbits_int(&pg->pg_flags, PG_BUSY|PG_FAKE|PG_WANTED);
UVM_PAGE_OWN(pg, NULL);
uvmfault_unlockall(ufi, amap, uobj);
+ if (uvmexp.free > uvmexp.reserve_kernel) {
+ vm_map_lock(kernel_map);
+ vm_map_unlock(kernel_map);
+ return ERESTART;
+ }
if (uvm_swapisfull()) {
/* XXX instrumentation */
return (ENOMEM);