Re: arm64: assertwaitok panic on page fault

Mark Kettenis Thu, 15 Jan 2026 05:36:37 -0800

> Date: Thu, 15 Jan 2026 11:17:11 +0100
> From: Martin Pieuchot <[email protected]>
> 
> Hello Christian,
> 
> Thanks for this interesting report.  You seem to have found a case where
> pool_get(9) with PR_NOWAIT might sleep...  See below.


Yes.  And this is a clear violation of contract.  Anyway, see below...

> On 13/01/26(Tue) 15:00, Christian Ludwig wrote:
> > Hi,
> > 
> > I ran into the following panic on my Raspberry Pi Zero2W when compiling
> > the kernel on a 2026-01-11 snapshot. Unfortunately, I do not know how to
> > fix this.
> > 
> > 
> >  - Christian
> > 
> > panic: assertwaitok: non-zero mutex count: 1
> > Stopped at      db_enter+0x18:  brk     #0xf000
> >     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
> >  433851  76208   1000         0x3          0    0  ccache
> > *156464  96163   1000         0x3          0    2  cc
> >  147282  40014   1000         0x3          0    1  cc
> >  121706  68538      0     0x14000      0x200    3  sdmmc0
> > db_enter() at panic+0x138
> > panic() at assertwaitok+0xb8
> > assertwaitok() at pool_get+0x34
> > pool_get() at uvm_mapent_alloc+0x20c
> > uvm_mapent_alloc() at uvm_map_clip_start+0x80
> > uvm_map_clip_start() at uvm_unmap_remove+0x248
> > uvm_unmap_remove() at uvm_unmap+0x64
> > https://www.openbsd.org/ddb.html describes the minimum info required in bug
> > reports.  Insufficient info makes it difficult to find and fix bugs.
> > ddb{2}> trace
> > db_enter() at panic+0x138
> > panic() at assertwaitok+0xb8
> > assertwaitok() at pool_get+0x34
> > pool_get() at uvm_mapent_alloc+0x20c
> 
> This seems to be the pool_get(9) for the kernel_map case.
> 
> > uvm_mapent_alloc() at uvm_map_clip_start+0x80
> > uvm_map_clip_start() at uvm_unmap_remove+0x248
> 
> The issue here is that uvm_map_clip_start() expect uvm_mapent_alloc() to
> return a new entry and possibly sleep.
> 
> > uvm_unmap_remove() at uvm_unmap+0x64
> > uvm_unmap() at km_free+0x50
> > km_free() at pool_p_alloc+0x1f4
> 
> I believe this km_free() correspond to pool_allocator_free() line 935 of
> kern/subr_pool.c.

Yes, and I believe it is the pool_allocator_free() call in
pool_p_alloc() that is the problem.  That means this is the
!POOL_INPGHDR(pp) branch where we use a separate pool for the pool
headers.  Apparently pool_get() on that separate pool fails, so we cool
pool_allocator_free() to free the pool page that we just allocated.

Perhaps what we could is change the order and do the pool_get() before
we call pool_allocator_alloc().  And if that fails, just return NULL.
This does mean of course that if pool_allocator_alloc() fails we need
to do a pool_put(), and that might trigger the same issue.  I thought
there was a mechanism to defer freeing pool pages when we can't sleep,
but I don't see that the current code.

> It's not clear to me how we should address this issue.  Having a km_free(9)
> that is less likely to sleep seems a good thing to me.  However it's not
> clear how to do that while uvm_unmap_remove() might need to clip entries
> and clipping needs to allocate.
>
> Maybe this could be worked around at the pool level?

> 
> Mark, David, what do you think?
> 
> > pool_p_alloc() at pool_do_get+0x20c
> > pool_do_get() at pool_get+0x8c
> > pool_get() at pmap_vp_enter+0x17c
> > pmap_vp_enter() at pmap_enter+0x1ac
> > pmap_enter() at uvm_fault_lower+0x220
> > uvm_fault_lower() at uvm_fault+0x158
> > uvm_fault() at udata_abort+0x128
> > udata_abort() at do_el0_sync+0x100
> > do_el0_sync() at handle_el0_sync+0x70
> > handle_el0_sync() at __ALIGN_SIZE+0x4b689f8
> > --- trap ---
> > end of kernel
> > ddb{2}> show uvm
> > Current UVM status:
> >   pagesize=4096 (0x1000), pagemask=0xfff, pageshift=12
> >   101363 VM pages: 42267 active, 18301 inactive, 1 wired, 7 free (1 zero)
> >   freemin=3378, free-target=4504, inactive-target=20000, wired-max=33787
> >   faults=10832044, traps=74264382, intrs=2185293, ctxswitch=1898507 
> > fpuswitch=0
> >   softint=881358, syscalls=54181390, kmapent=15
> >   fault counts:
> >     noram=46834246, noanon=0, noamap=0, pgwait=0, pgrele=0
> >     relocks=135632(521), upgrades=454828(1009) anget(retries)=5338133(0), 
> > amapcopy=1590537
> >     neighbor anon/obj pg=2294476/7526313, gets(lock/unlock)=2493095/137450
> >     cases: anon=4664089, anoncow=674044, obj=2156623, prcopy=333643, 
> > przero=3003868
> >   daemon and swap counts:
> >     woke=24230, revs=241, scans=14585, obscans=5500, anscans=9078
> >     busy=0, freed=6930, reactivate=0, deactivate=22337
> >     pageouts=376, pending=375, nswget=0
> >     nswapdev=1
> >     swpages=2097152, swpginuse=5995, swpgonly=3083 paging=16
> >   kernel pointers:
> >     objs(kern)=0xffffff80012aae08
> > 
> 
> 
>

Re: arm64: assertwaitok panic on page fault

Reply via email to