Hello Dana,

Thanks again for your report.  

On 19/06/24(Wed) 09:37, Dana Koch wrote:
> On Wed, Jun 19, 2024 at 6:58 AM Martin Pieuchot <m...@openbsd.org> wrote:
> > This is a lock order reversal reported by WITNESS.  Thankfully claudio@
> > already committed a fix for this on the 16th.  So please, try with
> > up-to-date sources
> 
> Just to be paranoid, I built a kernel with recent sources and
> MP_LOCKDEBUG and WITNESS. I experienced both the "lock spun out" error
> after "starting network" -- but not on serial console, unfortunately
> -- and from `make -j24` as mentioned which I did capture.

The problem is exposed by the many threads of lld(1).  While "starting
network" the boot process relinks a kernel.  More details below.

Since when do you experience this issue?
 
> Here is the ddb session -- properly capturing each `mach ddbcpu` and
> `trace` with hex this time:
> 
> __mp_lock_spin: 0xffffff8001292800 lock spun out
> Stopped at      __mp_lock+0x138:        ldr     w8, [x23,#712]
> ddb{16}> __mp_lock_spin: 0xffffff8001292800 lock spun out

The issue is related to the SCHED_LOCK(). Could you please next time use
"ps /o" in ddb, this will help me figure out which CPU trace correspond
to the process holding the KERNEL_LOCK().

> ddb{16}> show panic__mp_lock_spin: 0xffffff8001292800 lock spun out
> 
> the kernel did not panic
> ddb{16}> trace
> db_enter() at __mp_lock+0x134
> __mp_lock() at svc_handler+0x42c
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a436a0
> --- trap ---
> end of kernel

Spinning on KERNEL_LOCK();

> ddb{16}> mach ddbcpu 0
> Stopped at      __mp_lock+0x138:        ldr     w8, [x23,#712]
> ddb{0}> trace
> db_enter() at __mp_lock+0x134
> __mp_lock() at aplintc_irq_handler+0x158
> aplintc_irq_handler() at arm_cpu_irq+0x34
> arm_cpu_irq() at handle_el1h_irq+0x68
> handle_el1h_irq() at db_enter_ddb+0x268
> db_enter_ddb() at kdb_trap+0x64
> kdb_trap() at db_trapper+0x30
> db_trapper() at handle_el1h_sync+0x68
> handle_el1h_sync() at db_enter+0x14
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at aq_lookup+0x80
> aq_intr() at arm_cpu_irq+0x34
> arm_cpu_irq() at handle_el1h_irq+0x68
> handle_el1h_irq() at do_el1h_sync+0x40
> do_el1h_sync() at handle_el1h_sync+0x68
> handle_el1h_sync() at db_enter+0x14
> db_enter() at __mp_lock+0x134
> __mp_lock() at softintr_biglock_wrap+0x14
> softintr_biglock_wrap() at softintr_dispatch+0x84
> softintr_dispatch() at arm_do_pending_intr+0xfc
> arm_do_pending_intr() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xec
> mtx_enter() at sched_idle+0x12c
> sched_idle() at proc_trampoline+0xc

Spinning on both SCHED_LOCK() (thread context) and KERNEL_LOCK()
(interrupt context).

> ddb{0}> mach ddbcpu 1
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{1}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xec
> mtx_enter() at mi_switch+0x2f0
> mi_switch() at sched_idle+0x134
> sched_idle() at proc_trampoline+0xc

Spinning on SCHED_LOCK();

> ddb{1}> mach ddbcpu 2
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{2}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xec
> mtx_enter() at sched_idle+0x12c
> sched_idle() at proc_trampoline+0xc

Spinning on SCHED_LOCK();

> ddb{2}> mach ddbcpu 3
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{3}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xb8
> mtx_enter() at sched_idle+0x12c
> sched_idle() at proc_trampoline+0xc

Spinning on SCHED_LOCK();

> ddb{3}> mach ddbcpu 4
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{4}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x114
> mtx_enter() at sys_sched_yield+0x6c
> sys_sched_yield() at svc_handler+0x478
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a67b88
> --- trap ---
> end of kernel

Spinning on SCHED_LOCK();

> ddb{4}> mach ddbcpu 5
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{5}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at cpu_idle_cycle+0x28
> cpu_idle_cycle() at sched_idle+0x218
> sched_idle() at proc_trampoline+0xc

IDLE 

> ddb{5}> mach ddbcpu 6
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{6}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at __mp_lock+0x104
> __mp_lock() at svc_handler+0x42c
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a436a0

Spinning on KERNEL_LOCK();

> --- trap ---
> end of kernel
> ddb{6}> mach ddbcpu 7
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{7}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at __mp_lock+0x104
> __mp_lock() at reaper+0x108
> reaper() at proc_trampoline+0xc

Spinning on KERNEL_LOCK();

> ddb{7}> mach ddbcpu 8
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{8}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xe4
> mtx_enter() at exit1+0x534
> exit1() at sys___threxit+0x4c
> process_untrace() at svc_handler+0x44c
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a436a0

This lld(1) thread in sys___threxit(2) is probably holding the
KERNEL_LOCK().  It is spinning on the SCHED_LOCK() before calling
tuagg_locked().

> --- trap ---
> end of kernel
> ddb{8}> mach ddbcpu 9
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{9}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xe4
> mtx_enter() at sleep_setup+0x58
> sleep_setup() at rw_enter+0x154
> rw_enter() at sys_futex+0x50
> sys_futex() at svc_handler+0x478
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a37d80

Spinning on SCHED_LOCK() before placing itself on the sleep queue;

> --- trap ---
> end of kernel
> ddb{9}> mach ddbcpu a
> Symbol not found
> ddb{9}> mach ddbcpu 0xa
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{10}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0x114
> mtx_enter() at sleep_finish+0xb0
> sleep_finish() at rw_enter+0x184
> rw_enter() at sys_futex+0x50
> sys_futex() at svc_handler+0x478
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a37d80

Spinning on SCHED_LOCK() after having placed itself on the sleep queue. 

> --- trap ---
> end of kernel
> ddb{10}> mach ddbcpu 0xb
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{11}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xe4
> mtx_enter() at wakeup_n+0x44
> wakeup_n() at sys_futex+0xf0
> sys_futex() at svc_handler+0x478
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a37d80

Spinning on SCHED_LOCK().

> --- trap ---
> end of kernel
> ddb{11}> mach ddbcpu 0xc
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{12}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at cpu_idle_cycle+0x28
> cpu_idle_cycle() at sched_idle+0x218
> sched_idle() at proc_trampoline+0xc

IDLE.

> ddb{12}> mach ddbcpu 0xd
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{13}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at __mp_lock+0x110
> __mp_lock() at svc_handler+0x42c
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a436a0

Spinning on KERNEL_LOCK();

> --- trap ---
> end of kernel
> ddb{13}> mach ddbcpu 0xe
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{14}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xb0
> mtx_enter() at mi_switch+0x2f0
> mi_switch() at sys_sched_yield+0x8c
> sys_sched_yield() at svc_handler+0x478
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a67b88

Spinning on SCHED_LOCK();

> --- trap ---
> end of kernel
> ddb{14}> mach ddbcpu 0xf
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{15}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xe4
> mtx_enter() at mi_switch+0x2f0
> mi_switch() at sys_sched_yield+0x8c
> sys_sched_yield() at svc_handler+0x478
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a67b88

Spinning on SCHED_LOCK();

> --- trap ---
> end of kernel
> ddb{15}> mach ddbcpu 0x10
> Stopped at      __mp_lock+0x138:        ldr     w8, [x23,#712]
> ddb{16}> trace
> db_enter() at __mp_lock+0x134
> __mp_lock() at svc_handler+0x42c
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a436a0

Spinning on KERNEL_LOCK();

> --- trap ---
> end of kernel
> ddb{16}> mach ddbcpu 0x11
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{17}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at cpu_idle_cycle+0x28
> cpu_idle_cycle() at sched_idle+0x218
> sched_idle() at proc_trampoline+0xc

IDLE.

> ddb{17}> mach ddbcpu 0x12
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{18}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at cpu_idle_cycle+0x28
> cpu_idle_cycle() at sched_idle+0x218
> sched_idle() at proc_trampoline+0xc

IDLE.

> ddb{18}> mach ddbcpu 0x13
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{19}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at cpu_idle_cycle+0x28
> cpu_idle_cycle() at sched_idle+0x218
> sched_idle() at proc_trampoline+0xc

IDLE.

> ddb{19}> mach ddbcpu 0x14
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{20}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at cpu_idle_cycle+0x28
> cpu_idle_cycle() at sched_idle+0x218
> sched_idle() at proc_trampoline+0xc

IDLE.

> ddb{20}> mach ddbcpu 0x15
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{21}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at cpu_idle_cycle+0x28
> cpu_idle_cycle() at sched_idle+0x218
> sched_idle() at proc_trampoline+0xc

IDLE.

> ddb{21}> mach ddbcpu 0x16
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{22}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at __mp_lock+0x104
> __mp_lock() at svc_handler+0x42c
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a436a0
> --- trap ---
> end of kernel

Spinning on KERNEL_LOCK();

> ddb{22}> mach ddbcpu 0x17
> Stopped at      aplintc_fiq_handler+0x70:       b       ffffff8000265cb0 
> <aplin
> tc_fiq_handler+0x7c>
> ddb{23}> trace
> db_enter() at aplintc_fiq_handler+0x6c
> aplintc_fiq_handler() at arm_cpu_fiq+0x34
> arm_cpu_fiq() at handle_el1h_fiq+0x68
> handle_el1h_fiq() at mtx_enter+0xe4
> mtx_enter() at sys_sched_yield+0x6c
> sys_sched_yield() at svc_handler+0x478
> svc_handler() at do_el0_sync+0xc8
> do_el0_sync() at handle_el0_sync+0x70
> handle_el0_sync() at 0x479a67b88
> --- trap ---
> end of kernel

Spinning on SCHED_LOCK();

Reply via email to