On Mon, Jul 11, 2022 at 01:05:19PM +0200, Martin Pieuchot wrote: > On 11/07/22(Mon) 07:50, Theo Buehler wrote: > > On Fri, Jun 03, 2022 at 03:02:36PM +0200, Theo Buehler wrote: > > > > Please do note that this change can introduce/expose other issues. > > > > > > It seems that this diff causes occasional hangs when building snapshots > > > on my mac M1 mini. This happened twice in 10 builds, both times in > > > xenocara. Unfortunately, both times the machine became entirely > > > unresponsive and as I don't have serial console, that's all the info I > > > have... > > > > > > This machine has been very reliable and built >50 snaps without any hang > > > over the last 2.5 months. I'm now trying snap builds in a loop without > > > the diff to make sure the machine doesn't hang due to another recent > > > kernel change. > > > > > > > A little bit of info on this. The first three lines were a bit garbled on > > the screen: > > > > panic: kernel diagnostic assertion "uvn->_oppa jai c: ke r el > > d iag no tic a s rt n " map == UL L | | rw wr > > k > > ite held(amap->amap_lock)" failed: file "/ss/uvm/uvm_fault.c", line 846. > > ernel diagnostic assertion "!_kernel_lock_held > > Stopped at panic+0160: cmp w21, #0x0 ailed: file "/sys/kern/ > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > 411910 44540 21 0x100001 0 3 make > > *436444 84241 21 0x100003 0 6 sh > > 227952 53498 21 0x100003 0 5 sh > > 258925 15765 21 0x101005 0 0 make > > 128459 9649 21 0x100003 0 1 tradcpp > > 287213 64216 21 0x100003 0x8 7 make > > 173587 4617 1000 0x100000 0 2 tmux > > 126511 69919 0 0x14000 0x200 4 softnet > > db_enter() at panic+0x15c > > panic() at __assert+0x24 > > uvm_fault() at uvm_fault_upper_lookup+0x258 > > uvm_fault_upper() at uvm_fault+0xec > > uvm_fault() at udata_abort+0x128 > > udata_abort() at do_el0_sync+0xdc > > do_el0_sync() at handle_el0_sync+0x74 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb{6}> show panic > > *cpu0: kernel diagnostic assertion "uvn->u_obj.uo_refs == 0" failed: file > > "/sys/kern/uvn_vnode.c", line 231. > > cpu6: kernel diagnostic assertion "amap == NULL || > > rw_write_held(amap->am_lock)" failed: file "/sys/uvm/uvm_fault", line 846. > > cpu3: kernel diagnostic assertion "!_kernel_lock_held()" failed: file > > "/sys/kern/kern_fork.c", line 678 > > ddb{6}> mach ddbcpu 0 > > > > After pressing enter here, the machine locked up completely. > > It's hard for me to tell what's going on. I believe the interesting > trace is the one from cpu0 that we don't have. Can you easily reproduce > this? I'm trying on amd64 without luck. I'd glad if you could gather > more infos.
Sorry for the delay. I was only at home intermittently. I hit this three times: panic: kernel diagnostic assertion "uvn->u_obj.uo_refs == 0" failed: file "/sys/uvm/uvm_vnode.c", line 231 panic+0x160: cmp w21, #0x0 TID PID UID PRFLAGS PFLAGS CPU COMMAND 66455 64425 21 0x100001 0 7 make *501659 83050 21 0x101005 0 2K make 226254 83437 21 0x100001 0 0 sh 325842 29705 21 0x100003 0 5 gzip 450503 79436 21 0x100003 0 1 bdftopcf 223429 90969 21 0x100003 0 3 make 25518 23526 1000 0x100003 0 6 tee 482494 33196 0 0x14000 0x200 4 reaper db_enter() at panic+0x15c panic() at __assert+0x24 panic() at uvn_attach+0x2ac uvm_vnp_terminate() at vmcmd_map_pagedvn+0x58 vmcmd_map_pagedvn() at exec_process_vmcmds+0x80 exec_process_vmcmds() at sys_execve+0x5ac sys_execve() at svc_handler+0x2bc ddb{2}> bt db_enter() at panic+0x15c panic() at __assert+0x24 panic() at uvn_attach+0x2ac uvm_vnp_terminate() at vmcmd_map_pagedvn+0x58 vmcmd_map_pagedvn() at exec_process_vmcmds+0x80 exec_process_vmcmds() at sys_execve+0x5ac sys_execve() at svc_handler+0x2bc svc_handler() at do_el0_sync+0xa0 do_el0_sync() at handle_el0_sync+0x74 address 0x7ffffea988 is invalid --- trap --- ddb{2}> Apart from uvm_vnp_terminate(), the trace seems straightforward: sys_execve() exec_process_vmcmds() vmcmd_map_pagedvn() uvn_attach() KASSERT(uvn->u_obj.uo_refs == 0) I do not see how uvm_vnp_terminate() comes in. It showed up in all the traces I saw. vmcmd_map_pagedvn + 0x58 is the call to uvn_attach() in exec_subr.c:190.