Hello,

On 09/01/25(Thu) 22:49, Mark Kettenis wrote:
> > > > [...] 
> > > > Any proposal on how we could proceed to find a solution for this issue?
> > > 
> > > The following hack fixes the issue for me.  I don't think this is a
> > > proper solution, but it may be a starting point.  Or a temporary fix.
> > > 
> > > The issue really is that we can't tell whether pmap_enter(9) failed
> > > because we're out of physical memory, or if it failed for some other
> > > reason.  In the case at hand we failt because of contention on the
> > > kernel map lock.  But we could also be failing because we have completely 
> > > run out of KVA.
> > 
> > Works for me as well!
> >  
> > > We can't sleep while holding all those uvm locks.  I'm not sure the
> > > free memory check this does is right.  Or whether we want such a check
> > > at all.  The vm_map_lock()/vm_map_unlock() dance is necessary to make
> > > sure we don't spin too quickly if the kernel map lock is contended.
> > > 
> > > A better fix would perhaps be to have a new pmap function that we
> > > could call at this spot that would sleep until the necessary resources
> > > are available.  On arm64 this would populate the page tables using
> > > pool allocations that use PR_WAITOK, but would not actually enter a
> > > valid mapping.  I'm going to explore that idea a bit.
> 
> This seems to work.  Even a little bit better I think as the number of
> vp pool fails seems to be a bit smaller with this diff.
> 
> Thoughts?

Thanks a lot for the diagnostic and diff.

The fact that pools are used by pmaps with NOWAIT and that pools can only
be refilled with a sleeping context is very concerning.

They are multiple ways to fix the issue.  I'd prefer if we can work on a
fix that avoids re-faulting:

dlg@ Is it possible to make pools used by pmaps specials such that they
act as a thin wrapper around uvm_pagealloc(9)/uvm_pglistalloc(9)?  This
would help in two ways:
- ensure pmap_enter(9) only returns an error if OOM
- benefit of the per-CPU page cache for all pmaps

Alternatively we can see pool_get() shortages integrated in the page daemon.
Instead of adding pmap_populate() we could simply check for pools needing a
refill in the page daemon.  The same already applies to uvm_anwait().

But IMHO fewer usages of pools in UVM and pmaps is the best way forward.

> Index: arch/arm64/arm64/pmap.c
> ===================================================================
> RCS file: /cvs/src/sys/arch/arm64/arm64/pmap.c,v
> diff -u -p -r1.105 pmap.c
> --- arch/arm64/arm64/pmap.c   9 Nov 2024 12:58:29 -0000       1.105
> +++ arch/arm64/arm64/pmap.c   9 Jan 2025 21:44:41 -0000
> @@ -443,6 +443,67 @@ pmap_vp_enter(pmap_t pm, vaddr_t va, str
>       return 0;
>  }
>  
> +void
> +pmap_vp_populate(pmap_t pm, vaddr_t va)
> +{
> +     struct pte_desc *pted;
> +     struct pmapvp1 *vp1;
> +     struct pmapvp2 *vp2;
> +     struct pmapvp3 *vp3;
> +     void *vp;
> +
> +     pted = pool_get(&pmap_pted_pool, PR_WAITOK | PR_ZERO);
> +     vp = pool_get(&pmap_vp_pool, PR_WAITOK | PR_ZERO);
> +
> +     pmap_lock(pm);
> +
> +     if (pm->have_4_level_pt) {
> +             vp1 = pm->pm_vp.l0->vp[VP_IDX0(va)];
> +             if (vp1 == NULL) {
> +                     vp1 = vp; vp = NULL;
> +                     pmap_set_l1(pm, va, vp1);
> +             }
> +     } else {
> +             vp1 = pm->pm_vp.l1;
> +     }
> +
> +     if (vp == NULL) {
> +             pmap_unlock(pm);
> +             vp = pool_get(&pmap_vp_pool, PR_WAITOK | PR_ZERO);
> +             pmap_lock(pm);
> +     }
> +
> +     vp2 = vp1->vp[VP_IDX1(va)];
> +     if (vp2 == NULL) {
> +             vp2 = vp; vp = NULL;
> +             pmap_set_l2(pm, va, vp1, vp2);
> +     }
> +     
> +     if (vp == NULL) {
> +             pmap_unlock(pm);
> +             vp = pool_get(&pmap_vp_pool, PR_WAITOK | PR_ZERO);
> +             pmap_lock(pm);
> +     }
> +
> +     vp3 = vp2->vp[VP_IDX2(va)];
> +     if (vp3 == NULL) {
> +             vp3 = vp; vp = NULL;
> +             pmap_set_l3(pm, va, vp2, vp3);
> +     }
> +
> +     if (vp3->vp[VP_IDX3(va)] == NULL) {
> +             vp3->vp[VP_IDX3(va)] = pted;
> +             pted = NULL;
> +     }
> +
> +     pmap_unlock(pm);
> +
> +     if (vp)
> +             pool_put(&pmap_vp_pool, vp);
> +     if (pted)
> +             pool_put(&pmap_pted_pool, pted);
> +}
> +
>  void *
>  pmap_vp_page_alloc(struct pool *pp, int flags, int *slowdown)
>  {
> @@ -616,6 +677,11 @@ out:
>       return error;
>  }
>  
> +void
> +pmap_populate(pmap_t pm, vaddr_t va)
> +{
> +     pmap_vp_populate(pm, va);
> +}
>  
>  /*
>   * Remove the given range of mapping entries.
> Index: arch/arm64/include/pmap.h
> ===================================================================
> RCS file: /cvs/src/sys/arch/arm64/include/pmap.h,v
> diff -u -p -r1.25 pmap.h
> --- arch/arm64/include/pmap.h 11 Dec 2023 22:12:53 -0000      1.25
> +++ arch/arm64/include/pmap.h 9 Jan 2025 21:44:41 -0000
> @@ -123,6 +123,7 @@ struct pv_entry;
>  int  pmap_fault_fixup(pmap_t, vaddr_t, vm_prot_t);
>  
>  #define __HAVE_PMAP_MPSAFE_ENTER_COW
> +#define __HAVE_PMAP_POPULATE
>  
>  #endif /* _KERNEL && !_LOCORE */
>  
> Index: uvm/uvm_fault.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
> diff -u -p -r1.159 uvm_fault.c
> --- uvm/uvm_fault.c   3 Jan 2025 15:31:48 -0000       1.159
> +++ uvm/uvm_fault.c   9 Jan 2025 21:44:41 -0000
> @@ -1105,8 +1105,12 @@ uvm_fault_upper(struct uvm_faultinfo *uf
>                       /* XXX instrumentation */
>                       return ENOMEM;
>               }
> +#ifdef __HAVE_PMAP_POPULATE
> +             pmap_populate(ufi->orig_map->pmap, ufi->orig_rvaddr);
> +#else
>               /* XXX instrumentation */
>               uvm_wait("flt_pmfail1");
> +#endif
>               return ERESTART;
>       }
>  
> @@ -1457,8 +1461,12 @@ uvm_fault_lower(struct uvm_faultinfo *uf
>                       /* XXX instrumentation */
>                       return (ENOMEM);
>               }
> +#ifdef __HAVE_PMAP_POPULATE
> +             pmap_populate(ufi->orig_map->pmap, ufi->orig_rvaddr);
> +#else
>               /* XXX instrumentation */
>               uvm_wait("flt_pmfail2");
> +#endif
>               return ERESTART;
>       }
>  
> Index: uvm/uvm_pmap.h
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_pmap.h,v
> diff -u -p -r1.34 uvm_pmap.h
> --- uvm/uvm_pmap.h    3 Apr 2024 18:43:32 -0000       1.34
> +++ uvm/uvm_pmap.h    9 Jan 2025 21:44:41 -0000
> @@ -179,6 +179,10 @@ vaddr_t           pmap_steal_memory(vsize_t, vad
>  void          pmap_virtual_space(vaddr_t *, vaddr_t *);
>  #endif
>  
> +#if defined(__HAVE_PMAP_POPULATE)
> +void         pmap_populate(pmap_t, vaddr_t);
> +#endif
> +
>  /* nested pmaps are used in i386/amd64 vmm */
>  #ifndef pmap_nested
>  #define pmap_nested(pm) 0


Reply via email to