Hello Dana, Thanks again for your report.
On 19/06/24(Wed) 09:37, Dana Koch wrote: > On Wed, Jun 19, 2024 at 6:58 AM Martin Pieuchot <m...@openbsd.org> wrote: > > This is a lock order reversal reported by WITNESS. Thankfully claudio@ > > already committed a fix for this on the 16th. So please, try with > > up-to-date sources > > Just to be paranoid, I built a kernel with recent sources and > MP_LOCKDEBUG and WITNESS. I experienced both the "lock spun out" error > after "starting network" -- but not on serial console, unfortunately > -- and from `make -j24` as mentioned which I did capture. The problem is exposed by the many threads of lld(1). While "starting network" the boot process relinks a kernel. More details below. Since when do you experience this issue? > Here is the ddb session -- properly capturing each `mach ddbcpu` and > `trace` with hex this time: > > __mp_lock_spin: 0xffffff8001292800 lock spun out > Stopped at __mp_lock+0x138: ldr w8, [x23,#712] > ddb{16}> __mp_lock_spin: 0xffffff8001292800 lock spun out The issue is related to the SCHED_LOCK(). Could you please next time use "ps /o" in ddb, this will help me figure out which CPU trace correspond to the process holding the KERNEL_LOCK(). > ddb{16}> show panic__mp_lock_spin: 0xffffff8001292800 lock spun out > > the kernel did not panic > ddb{16}> trace > db_enter() at __mp_lock+0x134 > __mp_lock() at svc_handler+0x42c > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a436a0 > --- trap --- > end of kernel Spinning on KERNEL_LOCK(); > ddb{16}> mach ddbcpu 0 > Stopped at __mp_lock+0x138: ldr w8, [x23,#712] > ddb{0}> trace > db_enter() at __mp_lock+0x134 > __mp_lock() at aplintc_irq_handler+0x158 > aplintc_irq_handler() at arm_cpu_irq+0x34 > arm_cpu_irq() at handle_el1h_irq+0x68 > handle_el1h_irq() at db_enter_ddb+0x268 > db_enter_ddb() at kdb_trap+0x64 > kdb_trap() at db_trapper+0x30 > db_trapper() at handle_el1h_sync+0x68 > handle_el1h_sync() at db_enter+0x14 > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at aq_lookup+0x80 > aq_intr() at arm_cpu_irq+0x34 > arm_cpu_irq() at handle_el1h_irq+0x68 > handle_el1h_irq() at do_el1h_sync+0x40 > do_el1h_sync() at handle_el1h_sync+0x68 > handle_el1h_sync() at db_enter+0x14 > db_enter() at __mp_lock+0x134 > __mp_lock() at softintr_biglock_wrap+0x14 > softintr_biglock_wrap() at softintr_dispatch+0x84 > softintr_dispatch() at arm_do_pending_intr+0xfc > arm_do_pending_intr() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xec > mtx_enter() at sched_idle+0x12c > sched_idle() at proc_trampoline+0xc Spinning on both SCHED_LOCK() (thread context) and KERNEL_LOCK() (interrupt context). > ddb{0}> mach ddbcpu 1 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{1}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xec > mtx_enter() at mi_switch+0x2f0 > mi_switch() at sched_idle+0x134 > sched_idle() at proc_trampoline+0xc Spinning on SCHED_LOCK(); > ddb{1}> mach ddbcpu 2 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{2}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xec > mtx_enter() at sched_idle+0x12c > sched_idle() at proc_trampoline+0xc Spinning on SCHED_LOCK(); > ddb{2}> mach ddbcpu 3 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{3}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xb8 > mtx_enter() at sched_idle+0x12c > sched_idle() at proc_trampoline+0xc Spinning on SCHED_LOCK(); > ddb{3}> mach ddbcpu 4 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{4}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0x114 > mtx_enter() at sys_sched_yield+0x6c > sys_sched_yield() at svc_handler+0x478 > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a67b88 > --- trap --- > end of kernel Spinning on SCHED_LOCK(); > ddb{4}> mach ddbcpu 5 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{5}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at cpu_idle_cycle+0x28 > cpu_idle_cycle() at sched_idle+0x218 > sched_idle() at proc_trampoline+0xc IDLE > ddb{5}> mach ddbcpu 6 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{6}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at __mp_lock+0x104 > __mp_lock() at svc_handler+0x42c > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a436a0 Spinning on KERNEL_LOCK(); > --- trap --- > end of kernel > ddb{6}> mach ddbcpu 7 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{7}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at __mp_lock+0x104 > __mp_lock() at reaper+0x108 > reaper() at proc_trampoline+0xc Spinning on KERNEL_LOCK(); > ddb{7}> mach ddbcpu 8 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{8}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xe4 > mtx_enter() at exit1+0x534 > exit1() at sys___threxit+0x4c > process_untrace() at svc_handler+0x44c > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a436a0 This lld(1) thread in sys___threxit(2) is probably holding the KERNEL_LOCK(). It is spinning on the SCHED_LOCK() before calling tuagg_locked(). > --- trap --- > end of kernel > ddb{8}> mach ddbcpu 9 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{9}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xe4 > mtx_enter() at sleep_setup+0x58 > sleep_setup() at rw_enter+0x154 > rw_enter() at sys_futex+0x50 > sys_futex() at svc_handler+0x478 > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a37d80 Spinning on SCHED_LOCK() before placing itself on the sleep queue; > --- trap --- > end of kernel > ddb{9}> mach ddbcpu a > Symbol not found > ddb{9}> mach ddbcpu 0xa > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{10}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0x114 > mtx_enter() at sleep_finish+0xb0 > sleep_finish() at rw_enter+0x184 > rw_enter() at sys_futex+0x50 > sys_futex() at svc_handler+0x478 > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a37d80 Spinning on SCHED_LOCK() after having placed itself on the sleep queue. > --- trap --- > end of kernel > ddb{10}> mach ddbcpu 0xb > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{11}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xe4 > mtx_enter() at wakeup_n+0x44 > wakeup_n() at sys_futex+0xf0 > sys_futex() at svc_handler+0x478 > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a37d80 Spinning on SCHED_LOCK(). > --- trap --- > end of kernel > ddb{11}> mach ddbcpu 0xc > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{12}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at cpu_idle_cycle+0x28 > cpu_idle_cycle() at sched_idle+0x218 > sched_idle() at proc_trampoline+0xc IDLE. > ddb{12}> mach ddbcpu 0xd > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{13}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at __mp_lock+0x110 > __mp_lock() at svc_handler+0x42c > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a436a0 Spinning on KERNEL_LOCK(); > --- trap --- > end of kernel > ddb{13}> mach ddbcpu 0xe > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{14}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xb0 > mtx_enter() at mi_switch+0x2f0 > mi_switch() at sys_sched_yield+0x8c > sys_sched_yield() at svc_handler+0x478 > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a67b88 Spinning on SCHED_LOCK(); > --- trap --- > end of kernel > ddb{14}> mach ddbcpu 0xf > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{15}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xe4 > mtx_enter() at mi_switch+0x2f0 > mi_switch() at sys_sched_yield+0x8c > sys_sched_yield() at svc_handler+0x478 > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a67b88 Spinning on SCHED_LOCK(); > --- trap --- > end of kernel > ddb{15}> mach ddbcpu 0x10 > Stopped at __mp_lock+0x138: ldr w8, [x23,#712] > ddb{16}> trace > db_enter() at __mp_lock+0x134 > __mp_lock() at svc_handler+0x42c > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a436a0 Spinning on KERNEL_LOCK(); > --- trap --- > end of kernel > ddb{16}> mach ddbcpu 0x11 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{17}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at cpu_idle_cycle+0x28 > cpu_idle_cycle() at sched_idle+0x218 > sched_idle() at proc_trampoline+0xc IDLE. > ddb{17}> mach ddbcpu 0x12 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{18}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at cpu_idle_cycle+0x28 > cpu_idle_cycle() at sched_idle+0x218 > sched_idle() at proc_trampoline+0xc IDLE. > ddb{18}> mach ddbcpu 0x13 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{19}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at cpu_idle_cycle+0x28 > cpu_idle_cycle() at sched_idle+0x218 > sched_idle() at proc_trampoline+0xc IDLE. > ddb{19}> mach ddbcpu 0x14 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{20}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at cpu_idle_cycle+0x28 > cpu_idle_cycle() at sched_idle+0x218 > sched_idle() at proc_trampoline+0xc IDLE. > ddb{20}> mach ddbcpu 0x15 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{21}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at cpu_idle_cycle+0x28 > cpu_idle_cycle() at sched_idle+0x218 > sched_idle() at proc_trampoline+0xc IDLE. > ddb{21}> mach ddbcpu 0x16 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{22}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at __mp_lock+0x104 > __mp_lock() at svc_handler+0x42c > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a436a0 > --- trap --- > end of kernel Spinning on KERNEL_LOCK(); > ddb{22}> mach ddbcpu 0x17 > Stopped at aplintc_fiq_handler+0x70: b ffffff8000265cb0 > <aplin > tc_fiq_handler+0x7c> > ddb{23}> trace > db_enter() at aplintc_fiq_handler+0x6c > aplintc_fiq_handler() at arm_cpu_fiq+0x34 > arm_cpu_fiq() at handle_el1h_fiq+0x68 > handle_el1h_fiq() at mtx_enter+0xe4 > mtx_enter() at sys_sched_yield+0x6c > sys_sched_yield() at svc_handler+0x478 > svc_handler() at do_el0_sync+0xc8 > do_el0_sync() at handle_el0_sync+0x70 > handle_el0_sync() at 0x479a67b88 > --- trap --- > end of kernel Spinning on SCHED_LOCK();