> Date: Thu, 9 Jan 2025 17:29:38 +0100
> From: Marcus Glocker <[email protected]>
> 
> On Thu, Jan 09, 2025 at 05:05:14PM GMT, Mark Kettenis wrote:
> 
> > > Date: Thu, 9 Jan 2025 15:48:29 +0100
> > > From: Marcus Glocker <[email protected]>
> > > 
> > > Hello bugs@, Martin,
> > > 
> > > Since a while I am noticing processes hanging on my Samsung Galaxy Book4
> > > Edge (arm64/snapdragon-x/12-cores/16gb ram) machine.  Those hangs appear
> > > very frequent, which makes it hard to work on the machine since things
> > > like xterm, ssh, man, etc. just suddenly start to hang.  If this happens,
> > > executing another process would immediatly release the hanging/waiting
> > > process.
> > > 
> > > I've discussed this behavior today on icb, which has lead to the
> > > following conversation:
> > > 
> > > 11:39 < mglocker> 5344 hacki    -18    0 1436K  392K idle      flt_pmf   
> > > 0:00  0.00% man
> > > 11:41 < mglocker> uvm_wait("flt_pmfail1");
> > > 11:42 < mglocker> uvm_wait("flt_pmfail2");
> > > 11:43 < mglocker> 49811 hacki    -18    0 8144K  112K sleep/0   flt_pmf   
> > > 0:00  0.00% xterm
> > > 11:54 < mglocker> ok, the process hang is always at uvm/uvm_fault.c:1879 
> > > -> uvm_wait("flt_pmfail2")
> > > 
> > > 12:17 < kettenis> so that's pmap_enter() failing
> > > 12:19 < kettenis> which means a pool allocation failure
> > > 12:20 < kettenis> what does vmstat -m say about the "pted" and "vp" pools?
> > > 12:28 < mglocker> Name        Size Requests Fail    InUse Pgreq Pgrel 
> > > Npage Hiwat Minpg Maxpg Idle
> > > 12:29 < mglocker> pted          40   962117    0    42480  1582     0  
> > > 1582  1582     1     8    0
> > > 12:29 < mglocker> vp          8192    47009  102     5676  7830  1100  
> > > 6730  7830    20     8   20
> > > 12:30 < mglocker> vp 102 fails?
> > > 12:37 < mglocker> it keeps increasing on those hangs
> > > 12:46 < mglocker> so pmap_enter_vp() fails for
> > > 12:46 < mglocker> vp2 = pool_get()
> > > 12:46 < mglocker> and
> > > 12:47 < mglocker> vp3 = pool_get()
> > > 13:00 < mglocker> i booted again with a fresh single processor kernel.  
> > > there no vp fails.
> > > 13:09 < claudio> didn't we switch the vp pool to use per-cpu caches 
> > > exactly because of this?
> > > 14:02 < kettenis> I believe so
> > > 14:03 < kettenis> the problem is that pmap_enter(9) isn't supposed to 
> > > sleep
> > > 14:03 < kettenis> so the pool allocations are done with PR_NOWAIT
> > > 14:04 < kettenis> but that means that kd_trylock gets set
> > > 14:04 < kettenis> which means that the allocations fail if there is 
> > > contention on the pool lock
> > > 14:04 < claudio> yes, I remeber this strange behaviour.
> > > 14:06 < kettenis> uvm things this means we're out of physmem
> > > 14:06 < kettenis> so it'll sleep until something else pokes the pagedaemon
> > > 14:06 < kettenis> the per-cpu mitigated the issue somewhat
> > > 14:07 < kettenis> but didn't solve things completely
> > > 14:07 < kettenis> and now that mpi pushed back the locks in uvm again, 
> > > the problem is back
> > > 14:09 < kettenis> so we need a real solution for this problem...
> > > 14:12 < kettenis> a potential solution would be to make pmap_enter(9) 
> > > return a different error for this case
> > > 14:13 < kettenis> and then handle that case different in 
> > > uvm_fault_{upper|lower}
> > > 14:15 < kettenis> the problem there is that pool_get() doesn't actually 
> > > tell us why it failed
> > > 14:37 < kettenis> s/contention on the pool lock/contention on the kernal 
> > > map/
> > > 
> > > Any proposal on how we could proceed to find a solution for this issue?
> > 
> > The following hack fixes the issue for me.  I don't think this is a
> > proper solution, but it may be a starting point.  Or a temporary fix.
> > 
> > The issue really is that we can't tell whether pmap_enter(9) failed
> > because we're out of physical memory, or if it failed for some other
> > reason.  In the case at hand we failt because of contention on the
> > kernel map lock.  But we could also be failing because we have completely 
> > run out of KVA.
> 
> Works for me as well!
>  
> > We can't sleep while holding all those uvm locks.  I'm not sure the
> > free memory check this does is right.  Or whether we want such a check
> > at all.  The vm_map_lock()/vm_map_unlock() dance is necessary to make
> > sure we don't spin too quickly if the kernel map lock is contended.
> > 
> > A better fix would perhaps be to have a new pmap function that we
> > could call at this spot that would sleep until the necessary resources
> > are available.  On arm64 this would populate the page tables using
> > pool allocations that use PR_WAITOK, but would not actually enter a
> > valid mapping.  I'm going to explore that idea a bit.

This seems to work.  Even a little bit better I think as the number of
vp pool fails seems to be a bit smaller with this diff.

Thoughts?


Index: arch/arm64/arm64/pmap.c
===================================================================
RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v
diff -u -p -r1.105 pmap.c
--- arch/arm64/arm64/pmap.c     9 Nov 2024 12:58:29 -0000       1.105
+++ arch/arm64/arm64/pmap.c     9 Jan 2025 21:44:41 -0000
@@ -443,6 +443,67 @@ pmap_vp_enter(pmap_t pm, vaddr_t va, str
        return 0;
 }
 
+void
+pmap_vp_populate(pmap_t pm, vaddr_t va)
+{
+       struct pte_desc *pted;
+       struct pmapvp1 *vp1;
+       struct pmapvp2 *vp2;
+       struct pmapvp3 *vp3;
+       void *vp;
+
+       pted = pool_get(&pmap_pted_pool, PR_WAITOK | PR_ZERO);
+       vp = pool_get(&pmap_vp_pool, PR_WAITOK | PR_ZERO);
+
+       pmap_lock(pm);
+
+       if (pm->have_4_level_pt) {
+               vp1 = pm->pm_vp.l0->vp[VP_IDX0(va)];
+               if (vp1 == NULL) {
+                       vp1 = vp; vp = NULL;
+                       pmap_set_l1(pm, va, vp1);
+               }
+       } else {
+               vp1 = pm->pm_vp.l1;
+       }
+
+       if (vp == NULL) {
+               pmap_unlock(pm);
+               vp = pool_get(&pmap_vp_pool, PR_WAITOK | PR_ZERO);
+               pmap_lock(pm);
+       }
+
+       vp2 = vp1->vp[VP_IDX1(va)];
+       if (vp2 == NULL) {
+               vp2 = vp; vp = NULL;
+               pmap_set_l2(pm, va, vp1, vp2);
+       }
+       
+       if (vp == NULL) {
+               pmap_unlock(pm);
+               vp = pool_get(&pmap_vp_pool, PR_WAITOK | PR_ZERO);
+               pmap_lock(pm);
+       }
+
+       vp3 = vp2->vp[VP_IDX2(va)];
+       if (vp3 == NULL) {
+               vp3 = vp; vp = NULL;
+               pmap_set_l3(pm, va, vp2, vp3);
+       }
+
+       if (vp3->vp[VP_IDX3(va)] == NULL) {
+               vp3->vp[VP_IDX3(va)] = pted;
+               pted = NULL;
+       }
+
+       pmap_unlock(pm);
+
+       if (vp)
+               pool_put(&pmap_vp_pool, vp);
+       if (pted)
+               pool_put(&pmap_pted_pool, pted);
+}
+
 void *
 pmap_vp_page_alloc(struct pool *pp, int flags, int *slowdown)
 {
@@ -616,6 +677,11 @@ out:
        return error;
 }
 
+void
+pmap_populate(pmap_t pm, vaddr_t va)
+{
+       pmap_vp_populate(pm, va);
+}
 
 /*
  * Remove the given range of mapping entries.
Index: arch/arm64/include/pmap.h
===================================================================
RCS file: /cvs/src/sys/arch/arm64/include/pmap.h,v
diff -u -p -r1.25 pmap.h
--- arch/arm64/include/pmap.h   11 Dec 2023 22:12:53 -0000      1.25
+++ arch/arm64/include/pmap.h   9 Jan 2025 21:44:41 -0000
@@ -123,6 +123,7 @@ struct pv_entry;
 int    pmap_fault_fixup(pmap_t, vaddr_t, vm_prot_t);
 
 #define __HAVE_PMAP_MPSAFE_ENTER_COW
+#define __HAVE_PMAP_POPULATE
 
 #endif /* _KERNEL && !_LOCORE */
 
Index: uvm/uvm_fault.c
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
diff -u -p -r1.159 uvm_fault.c
--- uvm/uvm_fault.c     3 Jan 2025 15:31:48 -0000       1.159
+++ uvm/uvm_fault.c     9 Jan 2025 21:44:41 -0000
@@ -1105,8 +1105,12 @@ uvm_fault_upper(struct uvm_faultinfo *uf
                        /* XXX instrumentation */
                        return ENOMEM;
                }
+#ifdef __HAVE_PMAP_POPULATE
+               pmap_populate(ufi->orig_map->pmap, ufi->orig_rvaddr);
+#else
                /* XXX instrumentation */
                uvm_wait("flt_pmfail1");
+#endif
                return ERESTART;
        }
 
@@ -1457,8 +1461,12 @@ uvm_fault_lower(struct uvm_faultinfo *uf
                        /* XXX instrumentation */
                        return (ENOMEM);
                }
+#ifdef __HAVE_PMAP_POPULATE
+               pmap_populate(ufi->orig_map->pmap, ufi->orig_rvaddr);
+#else
                /* XXX instrumentation */
                uvm_wait("flt_pmfail2");
+#endif
                return ERESTART;
        }
 
Index: uvm/uvm_pmap.h
===================================================================
RCS file: /cvs/src/sys/uvm/uvm_pmap.h,v
diff -u -p -r1.34 uvm_pmap.h
--- uvm/uvm_pmap.h      3 Apr 2024 18:43:32 -0000       1.34
+++ uvm/uvm_pmap.h      9 Jan 2025 21:44:41 -0000
@@ -179,6 +179,10 @@ vaddr_t             pmap_steal_memory(vsize_t, vad
 void            pmap_virtual_space(vaddr_t *, vaddr_t *);
 #endif
 
+#if defined(__HAVE_PMAP_POPULATE)
+void           pmap_populate(pmap_t, vaddr_t);
+#endif
+
 /* nested pmaps are used in i386/amd64 vmm */
 #ifndef pmap_nested
 #define pmap_nested(pm) 0

Reply via email to