Thanks folks for your replies,

On Mon, Jul 07, 2025 at 12:26:21PM +0200, Mark Kettenis wrote:
> > Date: Mon, 7 Jul 2025 08:17:37 +0200
> > From: Martin Pieuchot <[email protected]>
> > 
> > On 06/07/25(Sun) 21:15, Jeremie Courreges-Anglas wrote:
> > > On Tue, Jul 01, 2025 at 06:18:37PM +0200, Jeremie Courreges-Anglas wrote:
> > > > On Tue, Jun 24, 2025 at 05:21:56PM +0200, Jeremie Courreges-Anglas 
> > > > wrote:
> > > > > 
> > > > > I think it's uvm_purge(), as far as I can see it happens when building
> > > > > rust with cvs up -D2025/06/04 in /sys, not with -D2025/06/03.  Maybe I
> > > > > missed lang/rust when testing the diff.
> > > > > 
> > > > > This is with additional MP_LOCKDEBUG support for mutexes, and
> > > > > __mp_lock_spinout = 50L * INT_MAX.
> > > > > 
> > > > > Suggested by claudio: tr /t 0t269515 fails.
> > > > 
> > > > Should be fixed, at least for CPUs that ddb actually managed to
> > > > stop...
> > > > 
> > > > > WITNESS doesn't flag an obvious lock ordering issue.  I'm not even
> > > > > sure there is one.  It also happen with CPU_MAX_BUSY_CYCLES == 64.
> > > > > 
> > > > > Maybe we're still hammering too much the locks?  Input and ideas to
> > > > > test welcome.  Right now I'm running with just uvm_purge() reverted.
> > > > 
> > > > Reverting uvm_purge() did not help. I've been able to reproduce the
> > > > hangs up to cvs up -D2025/05/18, backporting the arm64 ddb trace fix,
> > > > mpi's mutex backoff diff and the mtx_enter MP_LOCKDEBUG diff.
> > > > Currently trying to reproduce with cvs up -D2025/05/07 as suggested by
> > > > dlg.
> > > 
> > > cvs up -D2025/05/07 also did not help.  -current still locks up, but
> > > the db_mutex fix helps get proper stacktraces:
> > 
> > Thanks for the report.
> > 
> > > --8<--
> > > login: mtf_fnxe0:00nf2ffmrte8  0m21xt2  8xlef_ cf2nktft esl8r_u0f  k00t  
> > > sxor2f0
> > > 
> > >  0tf
> > >      8oeff
> > > 
> > >  ftff
> > > efStopped at    mtx_enter+0x13c:        ldr     x26, [x25,#2376]
> > 
> > What is your value of __mp_lock_spinout?

50L * INT_MAX

> > >From the ddb traces I understand that the `sched_lock' mutex is contended.
> > It's not clear to me why but I believe that's because rustc use
> > sched_yield(2) in a loop.  Could you figure out where this syscall is
> > coming from?

Frankly, I don't know if I want to try and figure it out. ^^"

> > I'm upset that sched_yield() is still used and causing trouble.  Now
> > that `sched_lock' is a mutex without guarantee of progress it is easy
> > to hang the machine by calling it in a loop.  A proper solution would be
> > to stop using sched_yield(2).  This will bite us as long as it is here.

Regarding its uses, there's one in librthread/rthread_sched.c which
looks legitimate, and also another in libc that caught my eye recently:

/*
 * internal support functions
 */
void
_spinlock(volatile _atomic_lock_t *lock)
{
        while (_atomic_lock(lock))
                sched_yield();
        membar_enter_after_atomic();
}
DEF_STRONG(_spinlock);

That one looks a bit heavy-handed...

Now, if the problem really is that our mutexes aren't fair, then isn't
any mutex (incl the sched lock) usable for such hammering from
userland?

> Hmm, well, mutexes might not be fair, but they should guarantee
> forward progress.  And the LL/SC primitives on arm64 do guarantee
> forward progress, but there are certain conditions.  And I think our
> current MI mutex implementation violates those conditions.  And
> WITNESS and MP_LOCKDEBUG might actually be making things worse.

I thought LL/SC guarantees depended on the length of the section
between the load and store, but... I usually fail to find what I'm
looking for when looking at ARM docs.

> Now on the M2 we should be using CAS instead of LL/SC.

Just to rule out an obvious issue:

ddb{0}> x arm64_has_lse
arm64_has_lse:             1

> The
> architecture reference manual isn't explicit about forward guarantees
> for that instruction.  So I need to dig a bit deeper into this.  I
> doubt there is no forward progress guarantee for those instructions,
> but maybe there are similar conditions on how these get used.
> 
> Will dig deeper and see if we can fix the mutex implementation on
> arm64.

ack, thanks

> > A workaround would be to use a backoff mechanism inside the loop.
> > 
> > Another workaround could be to never spin on the `sched_lock' in
> > sys_sched_yield() and instead sleep. 
> > 
> > Because of such contention all wakeup(9)s inside the pmemrange allocator
> > add contention on the `fpageqlock' which is what is "hanging" your machine.
> > 
> > Diff below is another workaround that might help.

Alas it doesn't seem to help enough to avoid hangs.  Here's another
fresh report using your diff.

--8<--
login: __mp_lock_spin: 0xffffff8001326508 lock spun out
Stopped at      db_enter+0x18:  brk     #0x0
ddb{4}> mach cpuinfo
    0: stopped
    1: stopped
    2: stopped
    3: stopped
*   4: ddb
    5: stopped
    6: stopped
    7: stopped
    8: stopped
    9: stopped
ddb{4}> mach ddb 0
Stopped at      db_enter+0x18:  brk     #0x0
ddb{0}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at __aarch64_cas8_acq_rel+0x1c
__aarch64_cas8_acq_rel() at wakeup_n+0x44
wakeup_n() at task_add+0x6c
task_add() at ifiq_input+0x1f4
ifiq_input() at bge_rxeof+0x32c
bge_rxeof() at bge_intr+0x174
bge_intr() at aplintc_irq_handler+0x170
aplintc_irq_handler() at arm_cpu_irq+0x44
arm_cpu_irq() at handle_el1h_irq+0x68
handle_el1h_irq() at mtx_enter+0x104
mtx_enter() at endtsleep+0x30
endtsleep() at timeout_run+0xb4
timeout_run() at softclock_process_tick_timeout+0xc4
softclock_process_tick_timeout() at softclock+0xe8
softclock() at softintr_dispatch+0xc8
softintr_dispatch() at arm_do_pending_intr+0xf4
arm_do_pending_intr() at mi_switch+0x1b0
mi_switch() at ast+0x98
ast() at handle_el0_sync+0x9c
handle_el0_sync() at 0x19ae4be318
--- trap ---
end of kernel
ddb{0}> mach ddb 1
Stopped at      db_enter+0x18:  brk     #0x0
ddb{1}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at __mp_lock+0x104
__mp_lock() at __mp_acquire_count+0x34
__mp_acquire_count() at uiomove+0x12c
uiomove() at ffs_read+0x198
ffs_read() at VOP_READ+0x3c
VOP_READ() at vn_rdwr+0xb0
vn_rdwr() at vmcmd_map_readvn+0x8c
vmcmd_map_readvn() at exec_process_vmcmds+0x78
exec_process_vmcmds() at sys_execve+0x620
sys_execve() at svc_handler+0x430
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x13812aa6d8
--- trap ---
end of kernel
ddb{1}> mach ddb 2
Stopped at      db_enter+0x18:  brk     #0x0
ddb{2}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at __mp_lock+0x104
__mp_lock() at doopenat+0x160
doopenat() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x4521566bc
--- trap ---
end of kernel
ddb{2}> mach ddb 3
Stopped at      db_enter+0x18:  brk     #0x0
ddb{3}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at __aarch64_cas8_acq_rel+0x1c
__aarch64_cas8_acq_rel() at sys_sched_yield+0x68
sys_sched_yield() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x19ae4b2fdc
--- trap ---
end of kernel
ddb{3}> mach ddb 4
Stopped at      db_enter+0x18:  brk     #0x0
ddb{4}> tr
db_enter() at __mp_lock+0x128
__mp_lock() at doopenat+0x160
doopenat() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x43e0b76bc
--- trap ---
end of kernel
ddb{4}> mach ddb 5
Stopped at      db_enter+0x18:  brk     #0x0
ddb{5}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at __mp_lock+0xf8
__mp_lock() at vn_statfile+0x2c
vn_statfile() at sys_fstat+0x84
sys_fstat() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x43dcf999c
--- trap ---
end of kernel
ddb{5}> mach ddb 6
Stopped at      db_enter+0x18:  brk     #0x0
ddb{6}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at agtimer_get_timecount_default+0x18
agtimer_get_timecount_default() at nanouptime+0x44
nanouptime() at tuagg_add_runtime+0x40
tuagg_add_runtime() at mi_switch+0x90
mi_switch() at sys_sched_yield+0x88
sys_sched_yield() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x19ae4b2fdc
--- trap ---
end of kernel
ddb{6}> mach ddb 7
Stopped at      db_enter+0x18:  brk     #0x0
ddb{7}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at __mp_lock+0xf8
__mp_lock() at sys___realpath+0x154
sys___realpath() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x4af9eacc8
--- trap ---
end of kernel
ddb{7}> mach ddb 8
Stopped at      db_enter+0x18:  brk     #0x0
ddb{8}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at aplintc_splx+0x1f0
aplintc_splx() at mtx_enter+0xe8
mtx_enter() at sys_sched_yield+0x68
sys_sched_yield() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x19ae4b2fdc
--- trap ---
end of kernel
ddb{8}> mach ddb 9
Stopped at      db_enter+0x18:  brk     #0x0
ddb{9}> tr
db_enter() at aplintc_fiq_handler+0x6c
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x10c
mtx_enter() at sys_sched_yield+0x68
sys_sched_yield() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x19ae4b2fdc
--- trap ---
end of kernel
ddb{9}> show all locks
CPU 0:
shared mutex timeout r = 0 (0xffffff8001202d88)
CPU 6:
exclusive mutex &sched_lock r = 0 (0xffffff8001272148)
Process 16821 (ninja) thread 0xffffff8113e80068 (92806)
exclusive rrwlock inode r = 0 (0xffffff8055608760)
Process 40072 (rustc) thread 0xffffff8113e814a8 (446746)
exclusive rrwlock inode r = 0 (0xffffff81049e8538)
Process 23063 (rustc) thread 0xffffff8113e819b8 (12581)
exclusive kernel_lock &kernel_lock r = 0 (0xffffff8001326d10)
ddb{9}> show witness /b
Number of known direct relationships is 443

Lock order reversal between "&mp->mnt_lock"(rwlock) and "&ip->i_lock"(rrwlock)!

witness: incomplete path, depth 8
ddb{9}> ps /o
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
  92806  16821     55      0x1005          0    1  ninja
 285104    167     55         0x3          0    2  c++
 202504  45511     55         0x3          0    7  c++
 288505  96331     55         0x3          0    4  c++
 134651  73257     55         0x3          0    5  c++
 230543  23063     55   0x2000003  0x4000000    3  rustc
  12581  23063     55   0x2000003  0x4000000    0K rustc
   4934  23063     55   0x2000003  0x4000000    8  rustc
* 89729  23063     55   0x2000003  0x4000000    9  rustc
ddb{9}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 16821   92806  23175     55  7      0x1005                ninja
   167  285104  63255     55  7         0x3                c++
 63255  492627  23175     55  3    0x10008b  sigsusp       sh
 40072   77074  38114     55  3   0x2000083  fsleep        rustc
 40072  100553  38114     55  3   0x6000083  piperd        rustc
 40072  446746  38114     55  2   0x6000003                rustc
 45511  202504   4053     55  7         0x3                c++
  4053   80003  23175     55  3    0x10008b  sigsusp       sh
 96331  288505  78782     55  7         0x3                c++
 78782   47058  23175     55  3    0x10008b  sigsusp       sh
 73257  134651  99268     55  7         0x3                c++
 99268  333490  23175     55  3    0x10008b  sigsusp       sh
 23063  335091  38114     55  3   0x2000083  fsleep        rustc
 23063  363979  38114     55  3   0x6000083  piperd        rustc
 23063  428745  38114     55  3   0x6000083  fsleep        rustc
 23063  279995  38114     55  3   0x6000083  fsleep        rustc
 23063  181877  38114     55  3   0x6000083  fsleep        rustc
 23063  230543  38114     55  7   0x6000003                rustc
 23063  216080  38114     55  2   0x6000003                rustc
 23063   12581  38114     55  7   0x6000003                rustc
 23063    4934  38114     55  7   0x6000003                rustc
*23063   89729  38114     55  7   0x6000003                rustc
 38114  473886  68745     55  3   0x2000083  fsleep        cargo
 38114  327225  68745     55  3   0x6000083  fsleep        cargo
 38114  207803  68745     55  3   0x6000083  kqread        cargo
 38114  442514  68745     55  3   0x6000083  kqread        cargo
 68745   91160  66756     55  3   0x2200083  wait          python3.12
 66756  145557  20253     55  3    0x10008b  sigsusp       make
 23175  305437  95548     55  3        0x8b  kqread        ninja
 95548  326376  95211     55  3        0x83  kqread        cmake
 95211  452507  60695     55  3    0x10008b  sigsusp       make
 20253  453028  74977   1000  3    0x10008b  sigsusp       make
 74977  118496  85512   1000  3    0x10008b  sigsusp       sh
 60695  512137  28928   1000  3    0x10008b  sigsusp       make
 28928  132313  46768   1000  3    0x10008b  sigsusp       sh
 46768  510404  31892   1000  3    0x10008b  sigsusp       make
 85512  168761   5112   1000  3    0x10008b  sigsusp       make
 31892   41853  33691   1000  3    0x10008b  sigsusp       ksh
  5112   45762  33691   1000  3    0x10008b  sigsusp       ksh
 33691  339742      1   1000  3    0x100080  kqread        tmux
 50447  258378  72048   1000  3    0x100083  kqread        tmux
 52948  303715      1   1000  3    0x1000b8  kqread        ssh-agent
 72048  282038  80004   1000  3    0x10008b  sigsusp       ksh
 80004  478023  75098   1000  3        0x98  kqread        sshd-session
 75098   88401  26790      0  3        0x92  kqread        sshd-session
 16285  372990      1      0  3    0x100083  ttyin         getty
 88950  327084      1      0  3    0x100083  ttyin         getty
 32299  412662      1      0  3    0x100083  ttyin         getty
  9624  388470      1      0  3    0x100083  ttyin         getty
 95942  199073      1      0  3    0x100083  ttyin         getty
 13386  235018      1      0  3    0x100083  ttyin         getty
 71288  139796      1      0  3    0x100098  kqread        cron
 58489  449220      1     99  3   0x1100090  kqread        sndiod
 88897  458345      1    110  3    0x100090  kqread        sndiod
 21307  145191  57732     95  3   0x1100092  kqread        smtpd
 85854  126275  57732    103  3   0x1100092  kqread        smtpd
 62845  395981  57732     95  3   0x1100092  kqread        smtpd
 68484  261396  57732     95  3    0x100092  kqread        smtpd
 96401  408083  57732     95  3   0x1100092  kqread        smtpd
 83314   38871  57732     95  3   0x1100092  kqread        smtpd
 57732  508617      1      0  3    0x100080  kqread        smtpd
 26790  121973      1      0  3        0x88  kqread        sshd
 29643  367483  12647      0  3        0x80  nfsd          nfsd
 94049  228378  12647      0  3        0x80  nfsd          nfsd
  6946   26461  12647      0  3        0x80  nfsd          nfsd
 27528  512446  12647      0  3        0x80  nfsd          nfsd
 12647  261198      1      0  3        0x80  netacc        nfsd
 40776  365439  11876      0  3        0x80  kqread        mountd
 11876  241363      1      0  3    0x100080  kqread        mountd
 56481   14460      1     28  3   0x1100090  kqread        portmap
 59772  105103  17556     83  3    0x100092  kqread        ntpd
 17556   51510  78827     83  3   0x1100092  kqread        ntpd
 78827  213143      1      0  3    0x100080  kqread        ntpd
 58198   68980  84345     74  3   0x1100092  bpf           pflogd
 84345  315647      1      0  3        0x80  sbwait        pflogd
  4462  325006  88024     73  3   0x1100090  kqread        syslogd
 88024  393773      1      0  3    0x100082  sbwait        syslogd
 20177  177437      1      0  3    0x100080  kqread        resolvd
  1916  512556  22600     77  3    0x100092  kqread        dhcpleased
 89301  356030  22600     77  3    0x100092  kqread        dhcpleased
 22600  388065      1      0  3        0x80  kqread        dhcpleased
  9674   57681  39128    115  3    0x100092  kqread        slaacd
 99969  364740  39128    115  3    0x100092  kqread        slaacd
 39128  377692      1      0  3    0x100080  kqread        slaacd
  7488  127254      0      0  3     0x14200  bored         wsdisplay0
 80941  354199      0      0  3     0x14200  bored         apple-dcp-afkep%02x
 83845  251298      0      0  3     0x14200  bored         apple-dcp-afkep%02x
 48577  323255      0      0  3     0x14200  bored         apple-dcp-afkep%02x
 26744  433388      0      0  3     0x14200  bored         drmrtk
 23659  215894      0      0  3     0x14200  bored         smr
 62145  444779      0      0  3     0x14200  pgzero        zerothread
 60948   51173      0      0  3     0x14200  aiodoned      aiodoned
 61934  416352      0      0  3     0x14200  syncer        update
 60595  230019      0      0  3     0x14200  cleaner       cleaner
 59553  366424      0      0  3     0x14200  reaper        reaper
 57148   36768      0      0  3     0x14200  pgdaemon      pagedaemon
  4700  330347      0      0  3     0x14200  usbtsk        usbtask
 37673  384731      0      0  3     0x14200  usbatsk       usbatsk
 71799  233642      0      0  3     0x14200  bored         drmtskl
 72815  226502      0      0  3     0x14200  bored         drmlwq
 22325  377620      0      0  3     0x14200  bored         drmlwq
 58569  438671      0      0  3     0x14200  bored         drmlwq
 35034  511019      0      0  3     0x14200  bored         drmlwq
 51245  343392      0      0  3     0x14200  bored         drmubwq
  5539  210307      0      0  3     0x14200  bored         drmubwq
 96204  103587      0      0  3     0x14200  bored         drmubwq
 99133  343617      0      0  3     0x14200  bored         drmubwq
 62111  109225      0      0  3     0x14200  bored         drmhpwq
 54844  103964      0      0  3     0x14200  bored         drmhpwq
 98273    6361      0      0  3     0x14200  bored         drmhpwq
 69125  196587      0      0  3     0x14200  bored         drmhpwq
 36820  327811      0      0  3     0x14200  bored         drmwq
   685  197752      0      0  3     0x14200  bored         drmwq
 12733  475339      0      0  3     0x14200  bored         drmwq
 50628   50813      0      0  3     0x14200  bored         drmwq
 38742  421411      0      0  3     0x14200  bored         sensors
 15252  489393      0      0  3     0x14200  bored         sleep
 45280  351306      0      0  3  0x40014200                idle9
 18476   19261      0      0  3  0x40014200                idle8
 73955  129108      0      0  3  0x40014200                idle7
 40102  188571      0      0  3  0x40014200                idle6
 68869   24066      0      0  3  0x40014200                idle5
 16358  379475      0      0  3  0x40014200                idle4
 28248  237855      0      0  3  0x40014200                idle3
 14101  490124      0      0  3  0x40014200                idle2
 48008  167892      0      0  3  0x40014200                idle1
 44662  441379      0      0  3     0x14200  bored         softnet3
 18453  428842      0      0  3     0x14200  bored         softnet2
 49989   83580      0      0  3     0x14200  bored         softnet1
 93898  345769      0      0  3     0x14200  bored         softnet0
 64190  206376      0      0  3     0x14200  bored         systqmp
 95969  416641      0      0  3     0x14200  bored         systq
 38042  457408      0      0  3     0x14200  tmoslp        softclockmp
 57671  339967      0      0  3  0x40014200  tmoslp        softclock
 32509  359030      0      0  3  0x40014200                idle0
 99051   67429      0      0  3     0x14200  kmalloc       kmthread
     1   12407      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{9}>
-->8--

-- 
jca

Reply via email to