> Date: Sun, 6 Jul 2025 21:15:34 +0200
> From: Jeremie Courreges-Anglas <[email protected]>
>
> On Tue, Jul 01, 2025 at 06:18:37PM +0200, Jeremie Courreges-Anglas wrote:
> > On Tue, Jun 24, 2025 at 05:21:56PM +0200, Jeremie Courreges-Anglas wrote:
> > >
> > > I think it's uvm_purge(), as far as I can see it happens when building
> > > rust with cvs up -D2025/06/04 in /sys, not with -D2025/06/03. Maybe I
> > > missed lang/rust when testing the diff.
> > >
> > > This is with additional MP_LOCKDEBUG support for mutexes, and
> > > __mp_lock_spinout = 50L * INT_MAX.
> > >
> > > Suggested by claudio: tr /t 0t269515 fails.
> >
> > Should be fixed, at least for CPUs that ddb actually managed to
> > stop...
> >
> > > WITNESS doesn't flag an obvious lock ordering issue. I'm not even
> > > sure there is one. It also happen with CPU_MAX_BUSY_CYCLES == 64.
> > >
> > > Maybe we're still hammering too much the locks? Input and ideas to
> > > test welcome. Right now I'm running with just uvm_purge() reverted.
> >
> > Reverting uvm_purge() did not help. I've been able to reproduce the
> > hangs up to cvs up -D2025/05/18, backporting the arm64 ddb trace fix,
> > mpi's mutex backoff diff and the mtx_enter MP_LOCKDEBUG diff.
> > Currently trying to reproduce with cvs up -D2025/05/07 as suggested by
> > dlg.
>
> cvs up -D2025/05/07 also did not help. -current still locks up, but
> the db_mutex fix helps get proper stacktraces:
Well, there are still issues with the stack traces on arm64.
Someone please review the diff I sent out on July 1st ("arm64 ddb trace fix").
>
> --8<--
> login: mtf_fnxe0:00nf2ffmrte8 0m21xt2 8xlef_ cf2nktft esl8r_u0f k00t
> sxor2f0
>
> 0tf
> 8oeff
>
> ftff
> efStopped at mtx_enter+0x13c: ldr x26, [x25,#2376]
> ddb{8}> mach cpuinfo
> 0: stopped
> 1: stopped
> 2: stopped
> 3: stopped
> 4: stopped
> 5: stopped
> 6: stopped
> 7: stopped
> * 8: ddb
> 9: stopped
> ddb{8}> mach ddb 0
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{0}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at uvm_pmr_getpages+0x16c
> uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
> uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
> uvm_pmr_cache_get() at uvm_pagealloc+0xc4
> uvm_pagealloc() at uvmfault_promote+0xac
> uvmfault_promote() at uvm_fault_lower+0x2e4
> uvm_fault_lower() at uvm_fault+0x158
> uvm_fault() at udata_abort+0x128
> udata_abort() at do_el0_sync+0x100
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x4c7b735f4
> --- trap ---
> end of kernel
> ddb{0}> mach ddb 1
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{1}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at sys_sched_yield+0x3c
> sys_sched_yield() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0xb05ca7fdc
> --- trap ---
> end of kernel
> ddb{1}> mach ddb 2
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{2}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at remrunqueue+0xfc
> remrunqueue() at sched_chooseproc+0x110
> sched_chooseproc() at mi_switch+0xf4
> mi_switch() at sys_sched_yield+0x88
> sys_sched_yield() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0xb05ca7fdc
> --- trap ---
> end of kernel
> ddb{2}> mach ddb 3
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{3}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at sys_sched_yield+0x68
> sys_sched_yield() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0xb05ca7fdc
> --- trap ---
> end of kernel
> ddb{3}> mach ddb 4
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{4}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at wakeup_n+0x44
> wakeup_n() at uvm_pmr_freepages+0x1f0
> uvm_pmr_freepages() at uvm_anfree_list+0xec
> uvm_anfree_list() at amap_wipeout+0xfc
> amap_wipeout() at uvm_unmap_detach+0x94
> uvm_unmap_detach() at sys_munmap+0x15c
> sys_munmap() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x4277ecffc
> --- trap ---
> end of kernel
> ddb{4}> mach ddb 5
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{5}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at preempt+0x20
> preempt() at ast+0x98
> ast() at handle_el0_sync+0x9c
> handle_el0_sync() at 0x16e30386bc
> --- trap ---
> end of kernel
> ddb{5}> mach ddb 6
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{6}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at preempt+0x20
> preempt() at ast+0x98
> ast() at handle_el0_sync+0x9c
> handle_el0_sync() at 0xb05cb3094
> --- trap ---
> end of kernel
> ddb{6}> mach ddb 7
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{7}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at exuartcnputc+0x44
> exuartcnputc() at cnputc+0x40
> cnputc() at db_putchar+0x2bc
> db_putchar() at kprintf+0xdd0
> kprintf() at db_printf+0x78
> db_printf() at mtx_enter+0x134
> mtx_enter() at uvm_pmr_freepages+0xc8
> uvm_pmr_freepages() at uvm_anfree_list+0xec
> uvm_anfree_list() at amap_wiperange_chunk+0x104
> amap_wiperange_chunk() at amap_wiperange+0x180
> amap_wiperange() at amap_pp_adjref+0x25c
> amap_pp_adjref() at amap_adjref_anons+0xf0
> amap_adjref_anons() at uvm_unmap_detach+0x94
> uvm_unmap_detach() at sys_munmap+0x15c
> sys_munmap() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x4ebf95ffc
> --- trap ---
> end of kernel
> ddb{7}> mach ddb 8
> Stopped at mtx_enter+0x13c: ldr x26, [x25,#2376]
> ddb{8}> tr
> db_enter() at mtx_enter+0x138
> mtx_enter() at uvm_pmr_getpages+0x16c
> uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
> uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
> uvm_pmr_cache_get() at uvm_pagealloc+0xc4
> uvm_pagealloc() at uvmfault_promote+0xac
> uvmfault_promote() at uvm_fault_lower+0x2e4
> uvm_fault_lower() at uvm_fault+0x158
> uvm_fault() at udata_abort+0x128
> udata_abort() at do_el0_sync+0x100
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at __ALIGN_SIZE+0x48d8044
> --- trap ---
> end of kernel
> ddb{8}> mach ddb 9
> Stopped at aplintc_fiq_handler+0x6c: b ffffff8000be553c
> <aplin
> tc_fiq_handler+0x7c>
> ddb{9}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at exuartcnputc+0x44
> exuartcnputc() at cnputc+0x40
> cnputc() at db_putchar+0x2bc
> db_putchar() at kprintf+0xdd0
> kprintf() at db_printf+0x78
> db_printf() at mtx_enter+0x134
> mtx_enter() at uvm_pmr_getpages+0x16c
> uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
> uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
> uvm_pmr_cache_get() at uvm_pagealloc+0xc4
> uvm_pagealloc() at uvmfault_promote+0xac
> uvmfault_promote() at uvm_fault_lower+0x2e4
> uvm_fault_lower() at uvm_fault+0x158
> uvm_fault() at udata_abort+0x128
> udata_abort() at do_el0_sync+0x100
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at __ALIGN_SIZE+0x37af29c
> --- trap ---
> end of kernel
> ddb{9}> show all locks
> CPU 4:
> exclusive mutex &uvm.fpageqlock r = 0 (0xffffff80012f8e38)
> CPU 2:
> exclusive mutex &sched_lock r = 0 (0xffffff80012615f8)
> CPU 1:
> exclusive mutex &pr->ps_mtx r = 0 (0xffffff8119fede98)
> Process 63925 (c++) thread 0xffffff81118dc088 (139279)
> exclusive rwlock amaplk r = 0 (0xffffff81186ea920)
> shared rwlock vmmaplk r = 0 (0xffffff8055613c40)
> Process 45702 (c++) thread 0xffffff80ffcaef78 (100747)
> exclusive rwlock amaplk r = 0 (0xffffff81137b5410)
> Process 56051 (c++) thread 0xffffff81118dc820 (416882)
> exclusive rwlock amaplk r = 0 (0xffffff8113c8ead0)
> shared rwlock vmmaplk r = 0 (0xffffff810f232e88)
> Process 1137 (c++) thread 0xffffff81029762d8 (371891)
> exclusive rwlock amaplk r = 0 (0xffffff80540e6cc0)
> shared rwlock vmmaplk r = 0 (0xffffff80556132e0)
> Process 95468 (c++) thread 0xffffff81118dd240 (227740)
> exclusive rwlock amaplk r = 0 (0xffffff811ab40078)
> ddb{9}> show witness /b
> Number of known direct relationships is 442
>
> Lock order reversal between "&mp->mnt_lock"(rwlock) and
> "&ip->i_lock"(rrwlock)!
>
> witness: incomplete path, depth 4
> ddb{9}> ps /o
> TID PID UID PRFLAGS PFLAGS CPU COMMAND
> 139279 63925 55 0x3 0 0 c++
> 57313 41142 55 0x2000003 0x4000000 5 rustc
> 100747 45702 55 0x3 0 4 c++
> 416882 56051 55 0x3 0 8 c++
> *371891 1137 55 0x3 0 9 c++
> 227740 95468 55 0x3 0 7 c++
> 344959 21226 55 0x2000003 0x4000000 3 rustc
> 418483 21226 55 0x2000003 0x4000000 1 rustc
> 69626 21226 55 0x2000003 0x4000000 6 rustc
> ddb{9}> ps
> PID TID PPID UID S FLAGS WAIT COMMAND
> 63925 139279 79490 55 7 0x3 c++
> 79490 171954 36022 55 3 0x10008b sigsusp sh
> 41142 134365 40336 55 3 0x2000083 fsleep rustc
> 41142 302975 40336 55 3 0x6000083 piperd rustc
> 41142 57313 40336 55 7 0x6000003 rustc
> 40336 347395 15169 55 3 0x2000083 wait rustc
> 45702 100747 64449 55 7 0x3 c++
> 64449 250642 36022 55 3 0x10008b sigsusp sh
> 56051 416882 47805 55 7 0x3 c++
> 47805 1643 36022 55 3 0x10008b sigsusp sh
> * 1137 371891 29510 55 7 0x3 c++
> 29510 131639 36022 55 3 0x10008b sigsusp sh
> 95468 227740 67080 55 7 0x3 c++
> 67080 286525 36022 55 3 0x10008b sigsusp sh
> 21226 108617 3494 55 3 0x2000083 fsleep rustc
> 21226 510541 3494 55 3 0x6000083 piperd rustc
> 21226 489915 3494 55 3 0x6000083 fsleep rustc
> 21226 67149 3494 55 3 0x6000083 piperd rustc
> 21226 48866 3494 55 3 0x6000083 fsleep rustc
> 21226 344959 3494 55 7 0x6000003 rustc
> 21226 418483 3494 55 7 0x6000003 rustc
> 21226 380395 3494 55 2 0x6000003 rustc
> 21226 69626 3494 55 7 0x6000003 rustc
> 3494 361736 15169 55 3 0x2000083 wait rustc
> 15169 47593 56577 55 3 0x2000083 fsleep cargo
> 15169 55128 56577 55 3 0x6000083 piperd cargo
> 15169 456809 56577 55 3 0x6000083 kqread cargo
> 15169 45734 56577 55 3 0x6000083 kqread cargo
> 56577 460218 94706 55 3 0x2000083 piperd bootstrap
> 94706 144930 19294 55 3 0x2200083 wait python3.12
> 19294 227744 46195 55 3 0x10008b sigsusp make
> 36022 189017 46616 55 3 0x8b kqread ninja
> 46616 494547 5179 55 3 0x83 kqread cmake
> 5179 41695 49137 55 3 0x10008b sigsusp make
> 46195 139722 78463 1000 3 0x10008b sigsusp make
> 78463 339800 44112 1000 3 0x10008b sigsusp sh
> 49137 166591 32884 1000 3 0x10008b sigsusp make
> 32884 130499 19324 1000 3 0x10008b sigsusp sh
> 19324 2171 13719 1000 3 0x10008b sigsusp make
> 44112 82636 13750 1000 3 0x10008b sigsusp make
> 13719 280625 37164 1000 3 0x10008b sigsusp ksh
> 13750 311794 37164 1000 3 0x10008b sigsusp ksh
> 37164 271851 1 1000 3 0x100080 kqread tmux
> 70208 65461 38741 1000 3 0x100083 kqread tmux
> 72473 190992 1 1000 3 0x1000b8 kqread ssh-agent
> 38741 398367 4112 1000 3 0x10008b sigsusp ksh
> 4112 49829 24648 1000 3 0x98 kqread sshd-session
> 24648 276484 70750 0 3 0x92 kqread sshd-session
> 64792 38488 1 0 3 0x100083 ttyin getty
> 13658 311859 1 0 3 0x100083 ttyin getty
> 93601 440231 1 0 3 0x100083 ttyin getty
> 63470 459539 1 0 3 0x100083 ttyin getty
> 96609 469760 1 0 3 0x100083 ttyin getty
> 30529 126857 1 0 3 0x100083 ttyin getty
> 25133 315686 1 0 3 0x100098 kqread cron
> 63695 355503 1 99 3 0x1100090 kqread sndiod
> 21489 342535 1 110 3 0x100090 kqread sndiod
> 2773 26068 87190 95 3 0x1100092 kqread smtpd
> 50414 65728 87190 103 3 0x1100092 kqread smtpd
> 26666 492754 87190 95 3 0x1100092 kqread smtpd
> 15809 384184 87190 95 3 0x100092 kqread smtpd
> 6296 50132 87190 95 3 0x1100092 kqread smtpd
> 62965 453772 87190 95 3 0x1100092 kqread smtpd
> 87190 36758 1 0 3 0x100080 kqread smtpd
> 70750 520713 1 0 3 0x88 kqread sshd
> 88177 217785 64329 0 3 0x80 nfsd nfsd
> 19091 209499 64329 0 3 0x80 nfsd nfsd
> 16547 436525 64329 0 3 0x80 nfsd nfsd
> 78468 310707 64329 0 3 0x80 nfsd nfsd
> 64329 34190 1 0 3 0x80 netacc nfsd
> 44169 309817 7798 0 3 0x80 kqread mountd
> 7798 3741 1 0 3 0x100080 kqread mountd
> 41132 247300 1 28 3 0x1100090 kqread portmap
> 41102 15370 59220 83 3 0x100092 kqread ntpd
> 59220 80027 79319 83 3 0x1100092 kqread ntpd
> 79319 186570 1 0 3 0x100080 kqread ntpd
> 72869 395225 46771 74 3 0x1100092 bpf pflogd
> 46771 417638 1 0 3 0x80 sbwait pflogd
> 94369 233003 28011 73 3 0x1100090 kqread syslogd
> 28011 381388 1 0 3 0x100082 sbwait syslogd
> 81963 219622 1 0 3 0x100080 kqread resolvd
> 88462 439074 21502 77 3 0x100092 kqread dhcpleased
> 59766 25595 21502 77 3 0x100092 kqread dhcpleased
> 21502 369741 1 0 3 0x80 kqread dhcpleased
> 556 135670 52238 115 3 0x100092 kqread slaacd
> 73789 109490 52238 115 3 0x100092 kqread slaacd
> 52238 190284 1 0 3 0x100080 kqread slaacd
> 41668 73399 0 0 3 0x14200 bored wsdisplay0
> 40970 505250 0 0 3 0x14200 bored apple-dcp-afkep%02x
> 97767 350383 0 0 3 0x14200 bored apple-dcp-afkep%02x
> 47220 373849 0 0 3 0x14200 bored apple-dcp-afkep%02x
> 88568 200274 0 0 3 0x14200 bored drmrtk
> 6787 522175 0 0 3 0x14200 bored smr
> 40522 9485 0 0 3 0x14200 pgzero zerothread
> 16498 109981 0 0 3 0x14200 aiodoned aiodoned
> 59611 103710 0 0 3 0x14200 syncer update
> 56156 385464 0 0 3 0x14200 cleaner cleaner
> 70501 345835 0 0 3 0x14200 reaper reaper
> 46109 83168 0 0 3 0x14200 pgdaemon pagedaemon
> 22925 161998 0 0 3 0x14200 usbtsk usbtask
> 63513 64083 0 0 3 0x14200 usbatsk usbatsk
> 18728 349413 0 0 3 0x14200 bored drmtskl
> 98853 157550 0 0 3 0x14200 bored drmlwq
> 39410 237152 0 0 3 0x14200 bored drmlwq
> 92541 76851 0 0 3 0x14200 bored drmlwq
> 47164 197379 0 0 3 0x14200 bored drmlwq
> 9853 127153 0 0 3 0x14200 bored drmubwq
> 18545 398258 0 0 3 0x14200 bored drmubwq
> 38466 331523 0 0 3 0x14200 bored drmubwq
> 16049 277430 0 0 3 0x14200 bored drmubwq
> 50444 316366 0 0 3 0x14200 bored drmhpwq
> 94598 67921 0 0 3 0x14200 bored drmhpwq
> 77503 379929 0 0 3 0x14200 bored drmhpwq
> 8860 448340 0 0 3 0x14200 bored drmhpwq
> 52476 388513 0 0 3 0x14200 bored drmwq
> 185 26590 0 0 3 0x14200 bored drmwq
> 45860 442452 0 0 3 0x14200 bored drmwq
> 64786 112562 0 0 3 0x14200 bored drmwq
> 58525 179335 0 0 3 0x14200 bored sensors
> 68906 400714 0 0 3 0x14200 bored sleep
> 92770 116939 0 0 3 0x40014200 idle9
> 54583 422906 0 0 3 0x40014200 idle8
> 93235 264602 0 0 3 0x40014200 idle7
> 23847 130999 0 0 3 0x40014200 idle6
> 87396 374260 0 0 3 0x40014200 idle5
> 67773 146451 0 0 3 0x40014200 idle4
> 664 465524 0 0 3 0x40014200 idle3
> 8634 372189 0 0 3 0x40014200 idle2
> 85417 214509 0 0 3 0x40014200 idle1
> 8301 414624 0 0 3 0x14200 bored softnet3
> 25868 484464 0 0 3 0x14200 bored softnet2
> 75194 231167 0 0 3 0x14200 bored softnet1
> 90163 153430 0 0 3 0x14200 bored softnet0
> 13057 294652 0 0 3 0x14200 bored systqmp
> 61843 256983 0 0 3 0x14200 bored systq
> 44957 446138 0 0 3 0x14200 tmoslp softclockmp
> 79423 156384 0 0 3 0x40014200 tmoslp softclock
> 40932 356300 0 0 3 0x40014200 idle0
> 80938 484135 0 0 3 0x14200 kmalloc kmthread
> 1 506573 0 0 3 0x82 wait init
> 0 0 -1 0 3 0x10200 scheduler swapper
> ddb{9}>
> -->8--
>
> --
> jca
>
>