On Tue, Jul 01, 2025 at 06:18:37PM +0200, Jeremie Courreges-Anglas wrote:
> On Tue, Jun 24, 2025 at 05:21:56PM +0200, Jeremie Courreges-Anglas wrote:
> > 
> > I think it's uvm_purge(), as far as I can see it happens when building
> > rust with cvs up -D2025/06/04 in /sys, not with -D2025/06/03.  Maybe I
> > missed lang/rust when testing the diff.
> > 
> > This is with additional MP_LOCKDEBUG support for mutexes, and
> > __mp_lock_spinout = 50L * INT_MAX.
> > 
> > Suggested by claudio: tr /t 0t269515 fails.
> 
> Should be fixed, at least for CPUs that ddb actually managed to
> stop...
> 
> > WITNESS doesn't flag an obvious lock ordering issue.  I'm not even
> > sure there is one.  It also happen with CPU_MAX_BUSY_CYCLES == 64.
> > 
> > Maybe we're still hammering too much the locks?  Input and ideas to
> > test welcome.  Right now I'm running with just uvm_purge() reverted.
> 
> Reverting uvm_purge() did not help. I've been able to reproduce the
> hangs up to cvs up -D2025/05/18, backporting the arm64 ddb trace fix,
> mpi's mutex backoff diff and the mtx_enter MP_LOCKDEBUG diff.
> Currently trying to reproduce with cvs up -D2025/05/07 as suggested by
> dlg.

cvs up -D2025/05/07 also did not help.  -current still locks up, but
the db_mutex fix helps get proper stacktraces:

--8<--
login: mtf_fnxe0:00nf2ffmrte8  0m21xt2  8xlef_ cf2nktft esl8r_u0f  k00t  sxor2f0

 0tf
     8oeff

 ftff
efStopped at    mtx_enter+0x13c:        ldr     x26, [x25,#2376]
ddb{8}> mach cpuinfo
    0: stopped
    1: stopped
    2: stopped
    3: stopped
    4: stopped
    5: stopped
    6: stopped
    7: stopped
*   8: ddb
    9: stopped
ddb{8}> mach ddb 0
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{0}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x10c
mtx_enter() at uvm_pmr_getpages+0x16c
uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
uvm_pmr_cache_get() at uvm_pagealloc+0xc4
uvm_pagealloc() at uvmfault_promote+0xac
uvmfault_promote() at uvm_fault_lower+0x2e4
uvm_fault_lower() at uvm_fault+0x158
uvm_fault() at udata_abort+0x128
udata_abort() at do_el0_sync+0x100
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x4c7b735f4
--- trap ---
end of kernel
ddb{0}> mach ddb 1
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{1}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at sys_sched_yield+0x3c
sys_sched_yield() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0xb05ca7fdc
--- trap ---
end of kernel
ddb{1}> mach ddb 2
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{2}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at remrunqueue+0xfc
remrunqueue() at sched_chooseproc+0x110
sched_chooseproc() at mi_switch+0xf4
mi_switch() at sys_sched_yield+0x88
sys_sched_yield() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0xb05ca7fdc
--- trap ---
end of kernel
ddb{2}> mach ddb 3
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{3}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x10c
mtx_enter() at sys_sched_yield+0x68
sys_sched_yield() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0xb05ca7fdc
--- trap ---
end of kernel
ddb{3}> mach ddb 4
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{4}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x10c
mtx_enter() at wakeup_n+0x44
wakeup_n() at uvm_pmr_freepages+0x1f0
uvm_pmr_freepages() at uvm_anfree_list+0xec
uvm_anfree_list() at amap_wipeout+0xfc
amap_wipeout() at uvm_unmap_detach+0x94
uvm_unmap_detach() at sys_munmap+0x15c
sys_munmap() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x4277ecffc
--- trap ---
end of kernel
ddb{4}> mach ddb 5
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{5}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x10c
mtx_enter() at preempt+0x20
preempt() at ast+0x98
ast() at handle_el0_sync+0x9c
handle_el0_sync() at 0x16e30386bc
--- trap ---
end of kernel
ddb{5}> mach ddb 6
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{6}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at mtx_enter+0x10c
mtx_enter() at preempt+0x20
preempt() at ast+0x98
ast() at handle_el0_sync+0x9c
handle_el0_sync() at 0xb05cb3094
--- trap ---
end of kernel
ddb{6}> mach ddb 7
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{7}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at exuartcnputc+0x44
exuartcnputc() at cnputc+0x40
cnputc() at db_putchar+0x2bc
db_putchar() at kprintf+0xdd0
kprintf() at db_printf+0x78
db_printf() at mtx_enter+0x134
mtx_enter() at uvm_pmr_freepages+0xc8
uvm_pmr_freepages() at uvm_anfree_list+0xec
uvm_anfree_list() at amap_wiperange_chunk+0x104
amap_wiperange_chunk() at amap_wiperange+0x180
amap_wiperange() at amap_pp_adjref+0x25c
amap_pp_adjref() at amap_adjref_anons+0xf0
amap_adjref_anons() at uvm_unmap_detach+0x94
uvm_unmap_detach() at sys_munmap+0x15c
sys_munmap() at svc_handler+0x450
svc_handler() at do_el0_sync+0x1cc
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at 0x4ebf95ffc
--- trap ---
end of kernel
ddb{7}> mach ddb 8
Stopped at      mtx_enter+0x13c:        ldr     x26, [x25,#2376]
ddb{8}> tr
db_enter() at mtx_enter+0x138
mtx_enter() at uvm_pmr_getpages+0x16c
uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
uvm_pmr_cache_get() at uvm_pagealloc+0xc4
uvm_pagealloc() at uvmfault_promote+0xac
uvmfault_promote() at uvm_fault_lower+0x2e4
uvm_fault_lower() at uvm_fault+0x158
uvm_fault() at udata_abort+0x128
udata_abort() at do_el0_sync+0x100
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at __ALIGN_SIZE+0x48d8044
--- trap ---
end of kernel
ddb{8}> mach ddb 9
Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c <aplin
tc_fiq_handler+0x7c>
ddb{9}> tr
db_enter() at aplintc_fiq_handler+0x68
aplintc_fiq_handler() at arm_cpu_fiq+0x44
arm_cpu_fiq() at handle_el1h_fiq+0x68
handle_el1h_fiq() at exuartcnputc+0x44
exuartcnputc() at cnputc+0x40
cnputc() at db_putchar+0x2bc
db_putchar() at kprintf+0xdd0
kprintf() at db_printf+0x78
db_printf() at mtx_enter+0x134
mtx_enter() at uvm_pmr_getpages+0x16c
uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
uvm_pmr_cache_get() at uvm_pagealloc+0xc4
uvm_pagealloc() at uvmfault_promote+0xac
uvmfault_promote() at uvm_fault_lower+0x2e4
uvm_fault_lower() at uvm_fault+0x158
uvm_fault() at udata_abort+0x128
udata_abort() at do_el0_sync+0x100
do_el0_sync() at handle_el0_sync+0x70
handle_el0_sync() at __ALIGN_SIZE+0x37af29c
--- trap ---
end of kernel
ddb{9}> show all locks
CPU 4:
exclusive mutex &uvm.fpageqlock r = 0 (0xffffff80012f8e38)
CPU 2:
exclusive mutex &sched_lock r = 0 (0xffffff80012615f8)
CPU 1:
exclusive mutex &pr->ps_mtx r = 0 (0xffffff8119fede98)
Process 63925 (c++) thread 0xffffff81118dc088 (139279)
exclusive rwlock amaplk r = 0 (0xffffff81186ea920)
shared rwlock vmmaplk r = 0 (0xffffff8055613c40)
Process 45702 (c++) thread 0xffffff80ffcaef78 (100747)
exclusive rwlock amaplk r = 0 (0xffffff81137b5410)
Process 56051 (c++) thread 0xffffff81118dc820 (416882)
exclusive rwlock amaplk r = 0 (0xffffff8113c8ead0)
shared rwlock vmmaplk r = 0 (0xffffff810f232e88)
Process 1137 (c++) thread 0xffffff81029762d8 (371891)
exclusive rwlock amaplk r = 0 (0xffffff80540e6cc0)
shared rwlock vmmaplk r = 0 (0xffffff80556132e0)
Process 95468 (c++) thread 0xffffff81118dd240 (227740)
exclusive rwlock amaplk r = 0 (0xffffff811ab40078)
ddb{9}> show witness /b
Number of known direct relationships is 442

Lock order reversal between "&mp->mnt_lock"(rwlock) and "&ip->i_lock"(rrwlock)!

witness: incomplete path, depth 4
ddb{9}> ps /o
    TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
 139279  63925     55         0x3          0    0  c++
  57313  41142     55   0x2000003  0x4000000    5  rustc
 100747  45702     55         0x3          0    4  c++
 416882  56051     55         0x3          0    8  c++
*371891   1137     55         0x3          0    9  c++
 227740  95468     55         0x3          0    7  c++
 344959  21226     55   0x2000003  0x4000000    3  rustc
 418483  21226     55   0x2000003  0x4000000    1  rustc
  69626  21226     55   0x2000003  0x4000000    6  rustc
ddb{9}> ps
   PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
 63925  139279  79490     55  7         0x3                c++
 79490  171954  36022     55  3    0x10008b  sigsusp       sh
 41142  134365  40336     55  3   0x2000083  fsleep        rustc
 41142  302975  40336     55  3   0x6000083  piperd        rustc
 41142   57313  40336     55  7   0x6000003                rustc
 40336  347395  15169     55  3   0x2000083  wait          rustc
 45702  100747  64449     55  7         0x3                c++
 64449  250642  36022     55  3    0x10008b  sigsusp       sh
 56051  416882  47805     55  7         0x3                c++
 47805    1643  36022     55  3    0x10008b  sigsusp       sh
* 1137  371891  29510     55  7         0x3                c++
 29510  131639  36022     55  3    0x10008b  sigsusp       sh
 95468  227740  67080     55  7         0x3                c++
 67080  286525  36022     55  3    0x10008b  sigsusp       sh
 21226  108617   3494     55  3   0x2000083  fsleep        rustc
 21226  510541   3494     55  3   0x6000083  piperd        rustc
 21226  489915   3494     55  3   0x6000083  fsleep        rustc
 21226   67149   3494     55  3   0x6000083  piperd        rustc
 21226   48866   3494     55  3   0x6000083  fsleep        rustc
 21226  344959   3494     55  7   0x6000003                rustc
 21226  418483   3494     55  7   0x6000003                rustc
 21226  380395   3494     55  2   0x6000003                rustc
 21226   69626   3494     55  7   0x6000003                rustc
  3494  361736  15169     55  3   0x2000083  wait          rustc
 15169   47593  56577     55  3   0x2000083  fsleep        cargo
 15169   55128  56577     55  3   0x6000083  piperd        cargo
 15169  456809  56577     55  3   0x6000083  kqread        cargo
 15169   45734  56577     55  3   0x6000083  kqread        cargo
 56577  460218  94706     55  3   0x2000083  piperd        bootstrap
 94706  144930  19294     55  3   0x2200083  wait          python3.12
 19294  227744  46195     55  3    0x10008b  sigsusp       make
 36022  189017  46616     55  3        0x8b  kqread        ninja
 46616  494547   5179     55  3        0x83  kqread        cmake
  5179   41695  49137     55  3    0x10008b  sigsusp       make
 46195  139722  78463   1000  3    0x10008b  sigsusp       make
 78463  339800  44112   1000  3    0x10008b  sigsusp       sh
 49137  166591  32884   1000  3    0x10008b  sigsusp       make
 32884  130499  19324   1000  3    0x10008b  sigsusp       sh
 19324    2171  13719   1000  3    0x10008b  sigsusp       make
 44112   82636  13750   1000  3    0x10008b  sigsusp       make
 13719  280625  37164   1000  3    0x10008b  sigsusp       ksh
 13750  311794  37164   1000  3    0x10008b  sigsusp       ksh
 37164  271851      1   1000  3    0x100080  kqread        tmux
 70208   65461  38741   1000  3    0x100083  kqread        tmux
 72473  190992      1   1000  3    0x1000b8  kqread        ssh-agent
 38741  398367   4112   1000  3    0x10008b  sigsusp       ksh
  4112   49829  24648   1000  3        0x98  kqread        sshd-session
 24648  276484  70750      0  3        0x92  kqread        sshd-session
 64792   38488      1      0  3    0x100083  ttyin         getty
 13658  311859      1      0  3    0x100083  ttyin         getty
 93601  440231      1      0  3    0x100083  ttyin         getty
 63470  459539      1      0  3    0x100083  ttyin         getty
 96609  469760      1      0  3    0x100083  ttyin         getty
 30529  126857      1      0  3    0x100083  ttyin         getty
 25133  315686      1      0  3    0x100098  kqread        cron
 63695  355503      1     99  3   0x1100090  kqread        sndiod
 21489  342535      1    110  3    0x100090  kqread        sndiod
  2773   26068  87190     95  3   0x1100092  kqread        smtpd
 50414   65728  87190    103  3   0x1100092  kqread        smtpd
 26666  492754  87190     95  3   0x1100092  kqread        smtpd
 15809  384184  87190     95  3    0x100092  kqread        smtpd
  6296   50132  87190     95  3   0x1100092  kqread        smtpd
 62965  453772  87190     95  3   0x1100092  kqread        smtpd
 87190   36758      1      0  3    0x100080  kqread        smtpd
 70750  520713      1      0  3        0x88  kqread        sshd
 88177  217785  64329      0  3        0x80  nfsd          nfsd
 19091  209499  64329      0  3        0x80  nfsd          nfsd
 16547  436525  64329      0  3        0x80  nfsd          nfsd
 78468  310707  64329      0  3        0x80  nfsd          nfsd
 64329   34190      1      0  3        0x80  netacc        nfsd
 44169  309817   7798      0  3        0x80  kqread        mountd
  7798    3741      1      0  3    0x100080  kqread        mountd
 41132  247300      1     28  3   0x1100090  kqread        portmap
 41102   15370  59220     83  3    0x100092  kqread        ntpd
 59220   80027  79319     83  3   0x1100092  kqread        ntpd
 79319  186570      1      0  3    0x100080  kqread        ntpd
 72869  395225  46771     74  3   0x1100092  bpf           pflogd
 46771  417638      1      0  3        0x80  sbwait        pflogd
 94369  233003  28011     73  3   0x1100090  kqread        syslogd
 28011  381388      1      0  3    0x100082  sbwait        syslogd
 81963  219622      1      0  3    0x100080  kqread        resolvd
 88462  439074  21502     77  3    0x100092  kqread        dhcpleased
 59766   25595  21502     77  3    0x100092  kqread        dhcpleased
 21502  369741      1      0  3        0x80  kqread        dhcpleased
   556  135670  52238    115  3    0x100092  kqread        slaacd
 73789  109490  52238    115  3    0x100092  kqread        slaacd
 52238  190284      1      0  3    0x100080  kqread        slaacd
 41668   73399      0      0  3     0x14200  bored         wsdisplay0
 40970  505250      0      0  3     0x14200  bored         apple-dcp-afkep%02x
 97767  350383      0      0  3     0x14200  bored         apple-dcp-afkep%02x
 47220  373849      0      0  3     0x14200  bored         apple-dcp-afkep%02x
 88568  200274      0      0  3     0x14200  bored         drmrtk
  6787  522175      0      0  3     0x14200  bored         smr
 40522    9485      0      0  3     0x14200  pgzero        zerothread
 16498  109981      0      0  3     0x14200  aiodoned      aiodoned
 59611  103710      0      0  3     0x14200  syncer        update
 56156  385464      0      0  3     0x14200  cleaner       cleaner
 70501  345835      0      0  3     0x14200  reaper        reaper
 46109   83168      0      0  3     0x14200  pgdaemon      pagedaemon
 22925  161998      0      0  3     0x14200  usbtsk        usbtask
 63513   64083      0      0  3     0x14200  usbatsk       usbatsk
 18728  349413      0      0  3     0x14200  bored         drmtskl
 98853  157550      0      0  3     0x14200  bored         drmlwq
 39410  237152      0      0  3     0x14200  bored         drmlwq
 92541   76851      0      0  3     0x14200  bored         drmlwq
 47164  197379      0      0  3     0x14200  bored         drmlwq
  9853  127153      0      0  3     0x14200  bored         drmubwq
 18545  398258      0      0  3     0x14200  bored         drmubwq
 38466  331523      0      0  3     0x14200  bored         drmubwq
 16049  277430      0      0  3     0x14200  bored         drmubwq
 50444  316366      0      0  3     0x14200  bored         drmhpwq
 94598   67921      0      0  3     0x14200  bored         drmhpwq
 77503  379929      0      0  3     0x14200  bored         drmhpwq
  8860  448340      0      0  3     0x14200  bored         drmhpwq
 52476  388513      0      0  3     0x14200  bored         drmwq
   185   26590      0      0  3     0x14200  bored         drmwq
 45860  442452      0      0  3     0x14200  bored         drmwq
 64786  112562      0      0  3     0x14200  bored         drmwq
 58525  179335      0      0  3     0x14200  bored         sensors
 68906  400714      0      0  3     0x14200  bored         sleep
 92770  116939      0      0  3  0x40014200                idle9
 54583  422906      0      0  3  0x40014200                idle8
 93235  264602      0      0  3  0x40014200                idle7
 23847  130999      0      0  3  0x40014200                idle6
 87396  374260      0      0  3  0x40014200                idle5
 67773  146451      0      0  3  0x40014200                idle4
   664  465524      0      0  3  0x40014200                idle3
  8634  372189      0      0  3  0x40014200                idle2
 85417  214509      0      0  3  0x40014200                idle1
  8301  414624      0      0  3     0x14200  bored         softnet3
 25868  484464      0      0  3     0x14200  bored         softnet2
 75194  231167      0      0  3     0x14200  bored         softnet1
 90163  153430      0      0  3     0x14200  bored         softnet0
 13057  294652      0      0  3     0x14200  bored         systqmp
 61843  256983      0      0  3     0x14200  bored         systq
 44957  446138      0      0  3     0x14200  tmoslp        softclockmp
 79423  156384      0      0  3  0x40014200  tmoslp        softclock
 40932  356300      0      0  3  0x40014200                idle0
 80938  484135      0      0  3     0x14200  kmalloc       kmthread
     1  506573      0      0  3        0x82  wait          init
     0       0     -1      0  3     0x10200  scheduler     swapper
ddb{9}>
-->8--

-- 
jca

Reply via email to