> Date: Sun, 6 Jul 2025 21:15:34 +0200
> From: Jeremie Courreges-Anglas <[email protected]>
> 
> On Tue, Jul 01, 2025 at 06:18:37PM +0200, Jeremie Courreges-Anglas wrote:
> > On Tue, Jun 24, 2025 at 05:21:56PM +0200, Jeremie Courreges-Anglas wrote:
> > > 
> > > I think it's uvm_purge(), as far as I can see it happens when building
> > > rust with cvs up -D2025/06/04 in /sys, not with -D2025/06/03.  Maybe I
> > > missed lang/rust when testing the diff.
> > > 
> > > This is with additional MP_LOCKDEBUG support for mutexes, and
> > > __mp_lock_spinout = 50L * INT_MAX.
> > > 
> > > Suggested by claudio: tr /t 0t269515 fails.
> > 
> > Should be fixed, at least for CPUs that ddb actually managed to
> > stop...
> > 
> > > WITNESS doesn't flag an obvious lock ordering issue.  I'm not even
> > > sure there is one.  It also happen with CPU_MAX_BUSY_CYCLES == 64.
> > > 
> > > Maybe we're still hammering too much the locks?  Input and ideas to
> > > test welcome.  Right now I'm running with just uvm_purge() reverted.
> > 
> > Reverting uvm_purge() did not help. I've been able to reproduce the
> > hangs up to cvs up -D2025/05/18, backporting the arm64 ddb trace fix,
> > mpi's mutex backoff diff and the mtx_enter MP_LOCKDEBUG diff.
> > Currently trying to reproduce with cvs up -D2025/05/07 as suggested by
> > dlg.
> 
> cvs up -D2025/05/07 also did not help.  -current still locks up, but
> the db_mutex fix helps get proper stacktraces:

Well, there are still issues with the stack traces on arm64.

Someone please review the diff I sent out on July 1st ("arm64 ddb trace fix").


> 
> --8<--
> login: mtf_fnxe0:00nf2ffmrte8  0m21xt2  8xlef_ cf2nktft esl8r_u0f  k00t  
> sxor2f0
> 
>  0tf
>      8oeff
> 
>  ftff
> efStopped at    mtx_enter+0x13c:        ldr     x26, [x25,#2376]
> ddb{8}> mach cpuinfo
>     0: stopped
>     1: stopped
>     2: stopped
>     3: stopped
>     4: stopped
>     5: stopped
>     6: stopped
>     7: stopped
> *   8: ddb
>     9: stopped
> ddb{8}> mach ddb 0
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{0}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at uvm_pmr_getpages+0x16c
> uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
> uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
> uvm_pmr_cache_get() at uvm_pagealloc+0xc4
> uvm_pagealloc() at uvmfault_promote+0xac
> uvmfault_promote() at uvm_fault_lower+0x2e4
> uvm_fault_lower() at uvm_fault+0x158
> uvm_fault() at udata_abort+0x128
> udata_abort() at do_el0_sync+0x100
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x4c7b735f4
> --- trap ---
> end of kernel
> ddb{0}> mach ddb 1
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{1}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at sys_sched_yield+0x3c
> sys_sched_yield() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0xb05ca7fdc
> --- trap ---
> end of kernel
> ddb{1}> mach ddb 2
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{2}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at remrunqueue+0xfc
> remrunqueue() at sched_chooseproc+0x110
> sched_chooseproc() at mi_switch+0xf4
> mi_switch() at sys_sched_yield+0x88
> sys_sched_yield() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0xb05ca7fdc
> --- trap ---
> end of kernel
> ddb{2}> mach ddb 3
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{3}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at sys_sched_yield+0x68
> sys_sched_yield() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0xb05ca7fdc
> --- trap ---
> end of kernel
> ddb{3}> mach ddb 4
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{4}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at wakeup_n+0x44
> wakeup_n() at uvm_pmr_freepages+0x1f0
> uvm_pmr_freepages() at uvm_anfree_list+0xec
> uvm_anfree_list() at amap_wipeout+0xfc
> amap_wipeout() at uvm_unmap_detach+0x94
> uvm_unmap_detach() at sys_munmap+0x15c
> sys_munmap() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x4277ecffc
> --- trap ---
> end of kernel
> ddb{4}> mach ddb 5
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{5}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at preempt+0x20
> preempt() at ast+0x98
> ast() at handle_el0_sync+0x9c
> handle_el0_sync() at 0x16e30386bc
> --- trap ---
> end of kernel
> ddb{5}> mach ddb 6
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{6}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x10c
> mtx_enter() at preempt+0x20
> preempt() at ast+0x98
> ast() at handle_el0_sync+0x9c
> handle_el0_sync() at 0xb05cb3094
> --- trap ---
> end of kernel
> ddb{6}> mach ddb 7
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{7}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at exuartcnputc+0x44
> exuartcnputc() at cnputc+0x40
> cnputc() at db_putchar+0x2bc
> db_putchar() at kprintf+0xdd0
> kprintf() at db_printf+0x78
> db_printf() at mtx_enter+0x134
> mtx_enter() at uvm_pmr_freepages+0xc8
> uvm_pmr_freepages() at uvm_anfree_list+0xec
> uvm_anfree_list() at amap_wiperange_chunk+0x104
> amap_wiperange_chunk() at amap_wiperange+0x180
> amap_wiperange() at amap_pp_adjref+0x25c
> amap_pp_adjref() at amap_adjref_anons+0xf0
> amap_adjref_anons() at uvm_unmap_detach+0x94
> uvm_unmap_detach() at sys_munmap+0x15c
> sys_munmap() at svc_handler+0x450
> svc_handler() at do_el0_sync+0x1cc
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x4ebf95ffc
> --- trap ---
> end of kernel
> ddb{7}> mach ddb 8
> Stopped at      mtx_enter+0x13c:        ldr     x26, [x25,#2376]
> ddb{8}> tr
> db_enter() at mtx_enter+0x138
> mtx_enter() at uvm_pmr_getpages+0x16c
> uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
> uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
> uvm_pmr_cache_get() at uvm_pagealloc+0xc4
> uvm_pagealloc() at uvmfault_promote+0xac
> uvmfault_promote() at uvm_fault_lower+0x2e4
> uvm_fault_lower() at uvm_fault+0x158
> uvm_fault() at udata_abort+0x128
> udata_abort() at do_el0_sync+0x100
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at __ALIGN_SIZE+0x48d8044
> --- trap ---
> end of kernel
> ddb{8}> mach ddb 9
> Stopped at      aplintc_fiq_handler+0x6c:       b       ffffff8000be553c 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{9}> tr
> db_enter() at aplintc_fiq_handler+0x68
> aplintc_fiq_handler() at arm_cpu_fiq+0x44
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at exuartcnputc+0x44
> exuartcnputc() at cnputc+0x40
> cnputc() at db_putchar+0x2bc
> db_putchar() at kprintf+0xdd0
> kprintf() at db_printf+0x78
> db_printf() at mtx_enter+0x134
> mtx_enter() at uvm_pmr_getpages+0x16c
> uvm_pmr_getpages() at uvm_pmr_cache_alloc+0x74
> uvm_pmr_cache_alloc() at uvm_pmr_cache_get+0x114
> uvm_pmr_cache_get() at uvm_pagealloc+0xc4
> uvm_pagealloc() at uvmfault_promote+0xac
> uvmfault_promote() at uvm_fault_lower+0x2e4
> uvm_fault_lower() at uvm_fault+0x158
> uvm_fault() at udata_abort+0x128
> udata_abort() at do_el0_sync+0x100
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at __ALIGN_SIZE+0x37af29c
> --- trap ---
> end of kernel
> ddb{9}> show all locks
> CPU 4:
> exclusive mutex &uvm.fpageqlock r = 0 (0xffffff80012f8e38)
> CPU 2:
> exclusive mutex &sched_lock r = 0 (0xffffff80012615f8)
> CPU 1:
> exclusive mutex &pr->ps_mtx r = 0 (0xffffff8119fede98)
> Process 63925 (c++) thread 0xffffff81118dc088 (139279)
> exclusive rwlock amaplk r = 0 (0xffffff81186ea920)
> shared rwlock vmmaplk r = 0 (0xffffff8055613c40)
> Process 45702 (c++) thread 0xffffff80ffcaef78 (100747)
> exclusive rwlock amaplk r = 0 (0xffffff81137b5410)
> Process 56051 (c++) thread 0xffffff81118dc820 (416882)
> exclusive rwlock amaplk r = 0 (0xffffff8113c8ead0)
> shared rwlock vmmaplk r = 0 (0xffffff810f232e88)
> Process 1137 (c++) thread 0xffffff81029762d8 (371891)
> exclusive rwlock amaplk r = 0 (0xffffff80540e6cc0)
> shared rwlock vmmaplk r = 0 (0xffffff80556132e0)
> Process 95468 (c++) thread 0xffffff81118dd240 (227740)
> exclusive rwlock amaplk r = 0 (0xffffff811ab40078)
> ddb{9}> show witness /b
> Number of known direct relationships is 442
> 
> Lock order reversal between "&mp->mnt_lock"(rwlock) and 
> "&ip->i_lock"(rrwlock)!
> 
> witness: incomplete path, depth 4
> ddb{9}> ps /o
>     TID    PID    UID     PRFLAGS     PFLAGS  CPU  COMMAND
>  139279  63925     55         0x3          0    0  c++
>   57313  41142     55   0x2000003  0x4000000    5  rustc
>  100747  45702     55         0x3          0    4  c++
>  416882  56051     55         0x3          0    8  c++
> *371891   1137     55         0x3          0    9  c++
>  227740  95468     55         0x3          0    7  c++
>  344959  21226     55   0x2000003  0x4000000    3  rustc
>  418483  21226     55   0x2000003  0x4000000    1  rustc
>   69626  21226     55   0x2000003  0x4000000    6  rustc
> ddb{9}> ps
>    PID     TID   PPID    UID  S       FLAGS  WAIT          COMMAND
>  63925  139279  79490     55  7         0x3                c++
>  79490  171954  36022     55  3    0x10008b  sigsusp       sh
>  41142  134365  40336     55  3   0x2000083  fsleep        rustc
>  41142  302975  40336     55  3   0x6000083  piperd        rustc
>  41142   57313  40336     55  7   0x6000003                rustc
>  40336  347395  15169     55  3   0x2000083  wait          rustc
>  45702  100747  64449     55  7         0x3                c++
>  64449  250642  36022     55  3    0x10008b  sigsusp       sh
>  56051  416882  47805     55  7         0x3                c++
>  47805    1643  36022     55  3    0x10008b  sigsusp       sh
> * 1137  371891  29510     55  7         0x3                c++
>  29510  131639  36022     55  3    0x10008b  sigsusp       sh
>  95468  227740  67080     55  7         0x3                c++
>  67080  286525  36022     55  3    0x10008b  sigsusp       sh
>  21226  108617   3494     55  3   0x2000083  fsleep        rustc
>  21226  510541   3494     55  3   0x6000083  piperd        rustc
>  21226  489915   3494     55  3   0x6000083  fsleep        rustc
>  21226   67149   3494     55  3   0x6000083  piperd        rustc
>  21226   48866   3494     55  3   0x6000083  fsleep        rustc
>  21226  344959   3494     55  7   0x6000003                rustc
>  21226  418483   3494     55  7   0x6000003                rustc
>  21226  380395   3494     55  2   0x6000003                rustc
>  21226   69626   3494     55  7   0x6000003                rustc
>   3494  361736  15169     55  3   0x2000083  wait          rustc
>  15169   47593  56577     55  3   0x2000083  fsleep        cargo
>  15169   55128  56577     55  3   0x6000083  piperd        cargo
>  15169  456809  56577     55  3   0x6000083  kqread        cargo
>  15169   45734  56577     55  3   0x6000083  kqread        cargo
>  56577  460218  94706     55  3   0x2000083  piperd        bootstrap
>  94706  144930  19294     55  3   0x2200083  wait          python3.12
>  19294  227744  46195     55  3    0x10008b  sigsusp       make
>  36022  189017  46616     55  3        0x8b  kqread        ninja
>  46616  494547   5179     55  3        0x83  kqread        cmake
>   5179   41695  49137     55  3    0x10008b  sigsusp       make
>  46195  139722  78463   1000  3    0x10008b  sigsusp       make
>  78463  339800  44112   1000  3    0x10008b  sigsusp       sh
>  49137  166591  32884   1000  3    0x10008b  sigsusp       make
>  32884  130499  19324   1000  3    0x10008b  sigsusp       sh
>  19324    2171  13719   1000  3    0x10008b  sigsusp       make
>  44112   82636  13750   1000  3    0x10008b  sigsusp       make
>  13719  280625  37164   1000  3    0x10008b  sigsusp       ksh
>  13750  311794  37164   1000  3    0x10008b  sigsusp       ksh
>  37164  271851      1   1000  3    0x100080  kqread        tmux
>  70208   65461  38741   1000  3    0x100083  kqread        tmux
>  72473  190992      1   1000  3    0x1000b8  kqread        ssh-agent
>  38741  398367   4112   1000  3    0x10008b  sigsusp       ksh
>   4112   49829  24648   1000  3        0x98  kqread        sshd-session
>  24648  276484  70750      0  3        0x92  kqread        sshd-session
>  64792   38488      1      0  3    0x100083  ttyin         getty
>  13658  311859      1      0  3    0x100083  ttyin         getty
>  93601  440231      1      0  3    0x100083  ttyin         getty
>  63470  459539      1      0  3    0x100083  ttyin         getty
>  96609  469760      1      0  3    0x100083  ttyin         getty
>  30529  126857      1      0  3    0x100083  ttyin         getty
>  25133  315686      1      0  3    0x100098  kqread        cron
>  63695  355503      1     99  3   0x1100090  kqread        sndiod
>  21489  342535      1    110  3    0x100090  kqread        sndiod
>   2773   26068  87190     95  3   0x1100092  kqread        smtpd
>  50414   65728  87190    103  3   0x1100092  kqread        smtpd
>  26666  492754  87190     95  3   0x1100092  kqread        smtpd
>  15809  384184  87190     95  3    0x100092  kqread        smtpd
>   6296   50132  87190     95  3   0x1100092  kqread        smtpd
>  62965  453772  87190     95  3   0x1100092  kqread        smtpd
>  87190   36758      1      0  3    0x100080  kqread        smtpd
>  70750  520713      1      0  3        0x88  kqread        sshd
>  88177  217785  64329      0  3        0x80  nfsd          nfsd
>  19091  209499  64329      0  3        0x80  nfsd          nfsd
>  16547  436525  64329      0  3        0x80  nfsd          nfsd
>  78468  310707  64329      0  3        0x80  nfsd          nfsd
>  64329   34190      1      0  3        0x80  netacc        nfsd
>  44169  309817   7798      0  3        0x80  kqread        mountd
>   7798    3741      1      0  3    0x100080  kqread        mountd
>  41132  247300      1     28  3   0x1100090  kqread        portmap
>  41102   15370  59220     83  3    0x100092  kqread        ntpd
>  59220   80027  79319     83  3   0x1100092  kqread        ntpd
>  79319  186570      1      0  3    0x100080  kqread        ntpd
>  72869  395225  46771     74  3   0x1100092  bpf           pflogd
>  46771  417638      1      0  3        0x80  sbwait        pflogd
>  94369  233003  28011     73  3   0x1100090  kqread        syslogd
>  28011  381388      1      0  3    0x100082  sbwait        syslogd
>  81963  219622      1      0  3    0x100080  kqread        resolvd
>  88462  439074  21502     77  3    0x100092  kqread        dhcpleased
>  59766   25595  21502     77  3    0x100092  kqread        dhcpleased
>  21502  369741      1      0  3        0x80  kqread        dhcpleased
>    556  135670  52238    115  3    0x100092  kqread        slaacd
>  73789  109490  52238    115  3    0x100092  kqread        slaacd
>  52238  190284      1      0  3    0x100080  kqread        slaacd
>  41668   73399      0      0  3     0x14200  bored         wsdisplay0
>  40970  505250      0      0  3     0x14200  bored         apple-dcp-afkep%02x
>  97767  350383      0      0  3     0x14200  bored         apple-dcp-afkep%02x
>  47220  373849      0      0  3     0x14200  bored         apple-dcp-afkep%02x
>  88568  200274      0      0  3     0x14200  bored         drmrtk
>   6787  522175      0      0  3     0x14200  bored         smr
>  40522    9485      0      0  3     0x14200  pgzero        zerothread
>  16498  109981      0      0  3     0x14200  aiodoned      aiodoned
>  59611  103710      0      0  3     0x14200  syncer        update
>  56156  385464      0      0  3     0x14200  cleaner       cleaner
>  70501  345835      0      0  3     0x14200  reaper        reaper
>  46109   83168      0      0  3     0x14200  pgdaemon      pagedaemon
>  22925  161998      0      0  3     0x14200  usbtsk        usbtask
>  63513   64083      0      0  3     0x14200  usbatsk       usbatsk
>  18728  349413      0      0  3     0x14200  bored         drmtskl
>  98853  157550      0      0  3     0x14200  bored         drmlwq
>  39410  237152      0      0  3     0x14200  bored         drmlwq
>  92541   76851      0      0  3     0x14200  bored         drmlwq
>  47164  197379      0      0  3     0x14200  bored         drmlwq
>   9853  127153      0      0  3     0x14200  bored         drmubwq
>  18545  398258      0      0  3     0x14200  bored         drmubwq
>  38466  331523      0      0  3     0x14200  bored         drmubwq
>  16049  277430      0      0  3     0x14200  bored         drmubwq
>  50444  316366      0      0  3     0x14200  bored         drmhpwq
>  94598   67921      0      0  3     0x14200  bored         drmhpwq
>  77503  379929      0      0  3     0x14200  bored         drmhpwq
>   8860  448340      0      0  3     0x14200  bored         drmhpwq
>  52476  388513      0      0  3     0x14200  bored         drmwq
>    185   26590      0      0  3     0x14200  bored         drmwq
>  45860  442452      0      0  3     0x14200  bored         drmwq
>  64786  112562      0      0  3     0x14200  bored         drmwq
>  58525  179335      0      0  3     0x14200  bored         sensors
>  68906  400714      0      0  3     0x14200  bored         sleep
>  92770  116939      0      0  3  0x40014200                idle9
>  54583  422906      0      0  3  0x40014200                idle8
>  93235  264602      0      0  3  0x40014200                idle7
>  23847  130999      0      0  3  0x40014200                idle6
>  87396  374260      0      0  3  0x40014200                idle5
>  67773  146451      0      0  3  0x40014200                idle4
>    664  465524      0      0  3  0x40014200                idle3
>   8634  372189      0      0  3  0x40014200                idle2
>  85417  214509      0      0  3  0x40014200                idle1
>   8301  414624      0      0  3     0x14200  bored         softnet3
>  25868  484464      0      0  3     0x14200  bored         softnet2
>  75194  231167      0      0  3     0x14200  bored         softnet1
>  90163  153430      0      0  3     0x14200  bored         softnet0
>  13057  294652      0      0  3     0x14200  bored         systqmp
>  61843  256983      0      0  3     0x14200  bored         systq
>  44957  446138      0      0  3     0x14200  tmoslp        softclockmp
>  79423  156384      0      0  3  0x40014200  tmoslp        softclock
>  40932  356300      0      0  3  0x40014200                idle0
>  80938  484135      0      0  3     0x14200  kmalloc       kmthread
>      1  506573      0      0  3        0x82  wait          init
>      0       0     -1      0  3     0x10200  scheduler     swapper
> ddb{9}>
> -->8--
> 
> -- 
> jca
> 
> 

Reply via email to