> Date: Mon, 7 Jul 2025 08:17:37 +0200
> From: Martin Pieuchot <[email protected]>
>
> On 06/07/25(Sun) 21:15, Jeremie Courreges-Anglas wrote:
> > On Tue, Jul 01, 2025 at 06:18:37PM +0200, Jeremie Courreges-Anglas wrote:
> > > On Tue, Jun 24, 2025 at 05:21:56PM +0200, Jeremie Courreges-Anglas wrote:
> > > >
> > > > I think it's uvm_purge(), as far as I can see it happens when building
> > > > rust with cvs up -D2025/06/04 in /sys, not with -D2025/06/03. Maybe I
> > > > missed lang/rust when testing the diff.
> > > >
> > > > This is with additional MP_LOCKDEBUG support for mutexes, and
> > > > __mp_lock_spinout = 50L * INT_MAX.
> > > >
> > > > Suggested by claudio: tr /t 0t269515 fails.
> > >
> > > Should be fixed, at least for CPUs that ddb actually managed to
> > > stop...
> > >
> > > > WITNESS doesn't flag an obvious lock ordering issue. I'm not even
> > > > sure there is one. It also happen with CPU_MAX_BUSY_CYCLES == 64.
> > > >
> > > > Maybe we're still hammering too much the locks? Input and ideas to
> > > > test welcome. Right now I'm running with just uvm_purge() reverted.
> > >
> > > Reverting uvm_purge() did not help. I've been able to reproduce the
> > > hangs up to cvs up -D2025/05/18, backporting the arm64 ddb trace fix,
> > > mpi's mutex backoff diff and the mtx_enter MP_LOCKDEBUG diff.
> > > Currently trying to reproduce with cvs up -D2025/05/07 as suggested by
> > > dlg.
> >
> > cvs up -D2025/05/07 also did not help. -current still locks up, but
> > the db_mutex fix helps get proper stacktraces:
>
> Thanks for the report.
>
> > --8<--
> > login: mtf_fnxe0:00nf2ffmrte8 0m21xt2 8xlef_ cf2nktft esl8r_u0f k00t
> > sxor2f0
> >
> > 0tf
> > 8oeff
> >
> > ftff
> > efStopped at mtx_enter+0x13c: ldr x26, [x25,#2376]
>
> What is your value of __mp_lock_spinout?
>
> >From the ddb traces I understand that the `sched_lock' mutex is contended.
> It's not clear to me why but I believe that's because rustc use
> sched_yield(2) in a loop. Could you figure out where this syscall is
> coming from?
>
> I'm upset that sched_yield() is still used and causing trouble. Now
> that `sched_lock' is a mutex without guarantee of progress it is easy
> to hang the machine by calling it in a loop. A proper solution would be
> to stop using sched_yield(2). This will bite us as long as it is here.
Hmm, well, mutexes might not be fair, but they should guarantee
forward progress. And the LL/SC primitives on arm64 do guarantee
forward progress, but there are certain conditions. And I think our
current MI mutex implementation violates those conditions. And
WITNESS and MP_LOCKDEBUG might actually be making things worse.
Now on the M2 we should be using CAS instead of LL/SC. The
architecture reference manual isn't explicit about forward guarantees
for that instruction. So I need to dig a bit deeper into this. I
doubt there is no forward progress guarantee for those instructions,
but maybe there are similar conditions on how these get used.
Will dig deeper and see if we can fix the mutex implementation on
arm64.
> A workaround would be to use a backoff mechanism inside the loop.
>
> Another workaround could be to never spin on the `sched_lock' in
> sys_sched_yield() and instead sleep.
>
> Because of such contention all wakeup(9)s inside the pmemrange allocator
> add contention on the `fpageqlock' which is what is "hanging" your machine.
>
> Diff below is another workaround that might help.
>
> Index: uvm/uvm_pmemrange.c
> ===================================================================
> RCS file: /cvs/src/sys/uvm/uvm_pmemrange.c,v
> diff -u -p -r1.77 uvm_pmemrange.c
> --- uvm/uvm_pmemrange.c 19 Feb 2025 11:10:54 -0000 1.77
> +++ uvm/uvm_pmemrange.c 7 Jul 2025 05:53:41 -0000
> @@ -1321,7 +1321,6 @@ uvm_pmr_freepages(struct vm_page *pg, ps
> }
>
> uvm_lock_fpageq();
> -
> for (i = count; i > 0; i -= pmr_count) {
> pmr = uvm_pmemrange_find(atop(VM_PAGE_TO_PHYS(pg)));
> KASSERT(pmr != NULL);
> @@ -1333,13 +1332,13 @@ uvm_pmr_freepages(struct vm_page *pg, ps
> uvmexp.free += pmr_count;
> pg += pmr_count;
> }
> + uvm_wakeup_pla(VM_PAGE_TO_PHYS(firstpg), ptoa(count));
> + uvm_unlock_fpageq();
> +
> wakeup(&uvmexp.free);
> if (uvmexp.zeropages < UVM_PAGEZERO_TARGET)
> wakeup(&uvmexp.zeropages);
>
> - uvm_wakeup_pla(VM_PAGE_TO_PHYS(firstpg), ptoa(count));
> -
> - uvm_unlock_fpageq();
> }
>
> /*
> @@ -1385,12 +1384,11 @@ uvm_pmr_freepageq(struct pglist *pgl)
>
> uvm_wakeup_pla(pstart, ptoa(plen));
> }
> + uvm_unlock_fpageq();
> +
> wakeup(&uvmexp.free);
> if (uvmexp.zeropages < UVM_PAGEZERO_TARGET)
> wakeup(&uvmexp.zeropages);
> - uvm_unlock_fpageq();
> -
> - return;
> }
>
> /*
>
>
>