On 2025 Jan 09 (Thu) at 22:49:32 +0100 (+0100), Mark Kettenis wrote:
:> Date: Thu, 9 Jan 2025 17:29:38 +0100
:> From: Marcus Glocker <[email protected]>
:>
:> On Thu, Jan 09, 2025 at 05:05:14PM GMT, Mark Kettenis wrote:
:>
:> > > Date: Thu, 9 Jan 2025 15:48:29 +0100
:> > > From: Marcus Glocker <[email protected]>
:> > >
:> > > Hello bugs@, Martin,
:> > >
:> > > Since a while I am noticing processes hanging on my Samsung Galaxy Book4
:> > > Edge (arm64/snapdragon-x/12-cores/16gb ram) machine. Those hangs appear
:> > > very frequent, which makes it hard to work on the machine since things
:> > > like xterm, ssh, man, etc. just suddenly start to hang. If this happens,
:> > > executing another process would immediatly release the hanging/waiting
:> > > process.
:> > >
:> > > I've discussed this behavior today on icb, which has lead to the
:> > > following conversation:
:> > >
:> > > 11:39 < mglocker> 5344 hacki -18 0 1436K 392K idle flt_pmf
0:00 0.00% man
:> > > 11:41 < mglocker> uvm_wait("flt_pmfail1");
:> > > 11:42 < mglocker> uvm_wait("flt_pmfail2");
:> > > 11:43 < mglocker> 49811 hacki -18 0 8144K 112K sleep/0 flt_pmf
0:00 0.00% xterm
:> > > 11:54 < mglocker> ok, the process hang is always at uvm/uvm_fault.c:1879
-> uvm_wait("flt_pmfail2")
:> > >
:> > > 12:17 < kettenis> so that's pmap_enter() failing
:> > > 12:19 < kettenis> which means a pool allocation failure
:> > > 12:20 < kettenis> what does vmstat -m say about the "pted" and "vp"
pools?
:> > > 12:28 < mglocker> Name Size Requests Fail InUse Pgreq Pgrel
Npage Hiwat Minpg Maxpg Idle
:> > > 12:29 < mglocker> pted 40 962117 0 42480 1582 0
1582 1582 1 8 0
:> > > 12:29 < mglocker> vp 8192 47009 102 5676 7830 1100
6730 7830 20 8 20
:> > > 12:30 < mglocker> vp 102 fails?
:> > > 12:37 < mglocker> it keeps increasing on those hangs
:> > > 12:46 < mglocker> so pmap_enter_vp() fails for
:> > > 12:46 < mglocker> vp2 = pool_get()
:> > > 12:46 < mglocker> and
:> > > 12:47 < mglocker> vp3 = pool_get()
:> > > 13:00 < mglocker> i booted again with a fresh single processor kernel.
there no vp fails.
:> > > 13:09 < claudio> didn't we switch the vp pool to use per-cpu caches
exactly because of this?
:> > > 14:02 < kettenis> I believe so
:> > > 14:03 < kettenis> the problem is that pmap_enter(9) isn't supposed to
sleep
:> > > 14:03 < kettenis> so the pool allocations are done with PR_NOWAIT
:> > > 14:04 < kettenis> but that means that kd_trylock gets set
:> > > 14:04 < kettenis> which means that the allocations fail if there is
contention on the pool lock
:> > > 14:04 < claudio> yes, I remeber this strange behaviour.
:> > > 14:06 < kettenis> uvm things this means we're out of physmem
:> > > 14:06 < kettenis> so it'll sleep until something else pokes the
pagedaemon
:> > > 14:06 < kettenis> the per-cpu mitigated the issue somewhat
:> > > 14:07 < kettenis> but didn't solve things completely
:> > > 14:07 < kettenis> and now that mpi pushed back the locks in uvm again,
the problem is back
:> > > 14:09 < kettenis> so we need a real solution for this problem...
:> > > 14:12 < kettenis> a potential solution would be to make pmap_enter(9)
return a different error for this case
:> > > 14:13 < kettenis> and then handle that case different in
uvm_fault_{upper|lower}
:> > > 14:15 < kettenis> the problem there is that pool_get() doesn't actually
tell us why it failed
:> > > 14:37 < kettenis> s/contention on the pool lock/contention on the kernal
map/
:> > >
:> > > Any proposal on how we could proceed to find a solution for this issue?
:> >
:> > The following hack fixes the issue for me. I don't think this is a
:> > proper solution, but it may be a starting point. Or a temporary fix.
:> >
:> > The issue really is that we can't tell whether pmap_enter(9) failed
:> > because we're out of physical memory, or if it failed for some other
:> > reason. In the case at hand we failt because of contention on the
:> > kernel map lock. But we could also be failing because we have completely
run out of KVA.
:>
:> Works for me as well!
:>
:> > We can't sleep while holding all those uvm locks. I'm not sure the
:> > free memory check this does is right. Or whether we want such a check
:> > at all. The vm_map_lock()/vm_map_unlock() dance is necessary to make
:> > sure we don't spin too quickly if the kernel map lock is contended.
:> >
:> > A better fix would perhaps be to have a new pmap function that we
:> > could call at this spot that would sleep until the necessary resources
:> > are available. On arm64 this would populate the page tables using
:> > pool allocations that use PR_WAITOK, but would not actually enter a
:> > valid mapping. I'm going to explore that idea a bit.
:
:This seems to work. Even a little bit better I think as the number of
:vp pool fails seems to be a bit smaller with this diff.
:
:Thoughts?
:
This survived a bulk build on the arm64.p bulk build cluster. I have
this, and mpi's uvm_lower_fault_handler diff, and the bulk time stayed
around the same at 3d8h.
--
Broad-mindedness, n.:
The result of flattening high-mindedness out.