On Thu, Jan 09, 2025 at 05:05:14PM GMT, Mark Kettenis wrote:

> > Date: Thu, 9 Jan 2025 15:48:29 +0100
> > From: Marcus Glocker <[email protected]>
> > 
> > Hello bugs@, Martin,
> > 
> > Since a while I am noticing processes hanging on my Samsung Galaxy Book4
> > Edge (arm64/snapdragon-x/12-cores/16gb ram) machine.  Those hangs appear
> > very frequent, which makes it hard to work on the machine since things
> > like xterm, ssh, man, etc. just suddenly start to hang.  If this happens,
> > executing another process would immediatly release the hanging/waiting
> > process.
> > 
> > I've discussed this behavior today on icb, which has lead to the
> > following conversation:
> > 
> > 11:39 < mglocker> 5344 hacki    -18    0 1436K  392K idle      flt_pmf   
> > 0:00  0.00% man
> > 11:41 < mglocker> uvm_wait("flt_pmfail1");
> > 11:42 < mglocker> uvm_wait("flt_pmfail2");
> > 11:43 < mglocker> 49811 hacki    -18    0 8144K  112K sleep/0   flt_pmf   
> > 0:00  0.00% xterm
> > 11:54 < mglocker> ok, the process hang is always at uvm/uvm_fault.c:1879 -> 
> > uvm_wait("flt_pmfail2")
> > 
> > 12:17 < kettenis> so that's pmap_enter() failing
> > 12:19 < kettenis> which means a pool allocation failure
> > 12:20 < kettenis> what does vmstat -m say about the "pted" and "vp" pools?
> > 12:28 < mglocker> Name        Size Requests Fail    InUse Pgreq Pgrel Npage 
> > Hiwat Minpg Maxpg Idle
> > 12:29 < mglocker> pted          40   962117    0    42480  1582     0  1582 
> >  1582     1     8    0
> > 12:29 < mglocker> vp          8192    47009  102     5676  7830  1100  6730 
> >  7830    20     8   20
> > 12:30 < mglocker> vp 102 fails?
> > 12:37 < mglocker> it keeps increasing on those hangs
> > 12:46 < mglocker> so pmap_enter_vp() fails for
> > 12:46 < mglocker> vp2 = pool_get()
> > 12:46 < mglocker> and
> > 12:47 < mglocker> vp3 = pool_get()
> > 13:00 < mglocker> i booted again with a fresh single processor kernel.  
> > there no vp fails.
> > 13:09 < claudio> didn't we switch the vp pool to use per-cpu caches exactly 
> > because of this?
> > 14:02 < kettenis> I believe so
> > 14:03 < kettenis> the problem is that pmap_enter(9) isn't supposed to sleep
> > 14:03 < kettenis> so the pool allocations are done with PR_NOWAIT
> > 14:04 < kettenis> but that means that kd_trylock gets set
> > 14:04 < kettenis> which means that the allocations fail if there is 
> > contention on the pool lock
> > 14:04 < claudio> yes, I remeber this strange behaviour.
> > 14:06 < kettenis> uvm things this means we're out of physmem
> > 14:06 < kettenis> so it'll sleep until something else pokes the pagedaemon
> > 14:06 < kettenis> the per-cpu mitigated the issue somewhat
> > 14:07 < kettenis> but didn't solve things completely
> > 14:07 < kettenis> and now that mpi pushed back the locks in uvm again, the 
> > problem is back
> > 14:09 < kettenis> so we need a real solution for this problem...
> > 14:12 < kettenis> a potential solution would be to make pmap_enter(9) 
> > return a different error for this case
> > 14:13 < kettenis> and then handle that case different in 
> > uvm_fault_{upper|lower}
> > 14:15 < kettenis> the problem there is that pool_get() doesn't actually 
> > tell us why it failed
> > 14:37 < kettenis> s/contention on the pool lock/contention on the kernal 
> > map/
> > 
> > Any proposal on how we could proceed to find a solution for this issue?
> 
> The following hack fixes the issue for me.  I don't think this is a
> proper solution, but it may be a starting point.  Or a temporary fix.
> 
> The issue really is that we can't tell whether pmap_enter(9) failed
> because we're out of physical memory, or if it failed for some other
> reason.  In the case at hand we failt because of contention on the
> kernel map lock.  But we could also be failing because we have completely run 
> out of KVA.

Works for me as well!
 
> We can't sleep while holding all those uvm locks.  I'm not sure the
> free memory check this does is right.  Or whether we want such a check
> at all.  The vm_map_lock()/vm_map_unlock() dance is necessary to make
> sure we don't spin too quickly if the kernel map lock is contended.
> 
> A better fix would perhaps be to have a new pmap function that we
> could call at this spot that would sleep until the necessary resources
> are available.  On arm64 this would populate the page tables using
> pool allocations that use PR_WAITOK, but would not actually enter a
> valid mapping.  I'm going to explore that idea a bit.
> 
> 
> Index: uvm/uvm_fault.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
> diff -u -p -r1.159 uvm_fault.c
> --- uvm/uvm_fault.c   3 Jan 2025 15:31:48 -0000       1.159
> +++ uvm/uvm_fault.c   9 Jan 2025 15:39:53 -0000
> @@ -1101,6 +1101,11 @@ uvm_fault_upper(struct uvm_faultinfo *uf
>                * as the map may change while we're asleep.
>                */
>               uvmfault_unlockall(ufi, amap, NULL);
> +             if (uvmexp.free > uvmexp.reserve_kernel) {
> +                     vm_map_lock(kernel_map);
> +                     vm_map_unlock(kernel_map);
> +                     return ERESTART;
> +             }
>               if (uvm_swapisfull()) {
>                       /* XXX instrumentation */
>                       return ENOMEM;
> @@ -1453,6 +1458,11 @@ uvm_fault_lower(struct uvm_faultinfo *uf
>               atomic_clearbits_int(&pg->pg_flags, PG_BUSY|PG_FAKE|PG_WANTED);
>               UVM_PAGE_OWN(pg, NULL);
>               uvmfault_unlockall(ufi, amap, uobj);
> +             if (uvmexp.free > uvmexp.reserve_kernel) {
> +                     vm_map_lock(kernel_map);
> +                     vm_map_unlock(kernel_map);
> +                     return ERESTART;
> +             }
>               if (uvm_swapisfull()) {
>                       /* XXX instrumentation */
>                       return (ENOMEM);
> 

Reply via email to