The riscv64 build cluster was running a kernel with the very latest smte(4) commit. Sadly today I found two hosts crashed at the same location. Looks like something that should reported as a fatal page fault by panic() but it doesn't even make it there. "show pool" doesn't work either.
Not rebooting these two hosts as they need manual intervention because of another issue, so their ddb is available as long as I don't ask deraadt to kick them. first host: t[0] == 0xffffffc023cb9098 t[1] == 0x0000000000000002 t[2] == 0x0000000000000018 t[3] == 0x0000000000000002 t[4] == 0x00000000778cf4a0 t[5] == 0x000000000000000c t[6] == 0x00000000778ceed8 s[0] == 0xffffffc07ab31c90 s[1] == 0xffffffffffffffff s[2] == 0xffffffc0236b4d18 s[3] == 0x00000000000005c8 s[4] == 0xffffffc024c4bed8 s[5] == 0x00000000000005c8 s[6] == 0xffffffc024c4bed8 s[7] == 0xffffffc023cb9068 s[8] == 0xffffffc000a35f08 s[9] == 0x0000000000000001 s[10] == 0x0000000000000001 s[11] == 0xffffffffffffffff a[0] == 0xffffffc024c4c000 a[1] == 0xffffffc024c4c4c0 a[2] == 0xffffffc000a34cba a[3] == 0x0000000000000040 a[4] == 0x0000000000000008 a[5] == 0xffffffc0006ee23c a[6] == 0xffffffc023cb9068 a[7] == 0x0000000000000001 sepc == 0xffffffc0002b4cd4 sstatus == 0x0000000200000120 stval == 0xffffffc024c4c000 scause == 0x000000000000000f Stopped at panic+0xfc: addi a0,zero,256 TID PID UID PR FLAGS PFLAGS CPU COMMAND 122455 5614 55 0x1000002 0 6 c++ 455499 39700 55 0x1000002 0 0 c++ 296634 22886 55 0x1000002 0 7 c++ 99173 60313 55 0x1000002 0 1 c++ 327995 89589 55 0x1000002 0 5 c++ 199455 66297 55 0x1000002 0 4 c++ 127821 19055 55 0x1000002 0 3 c++ *296440 9059 0 0x14000 0x200 2 softnet0 panic() at panic+0xfc do_trap_supervisor() at do_trap_supervisor+0x1f4 cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7e _dmamap_sync() at _dmamap_sync+0x13a smte_encap() at smte_encap+0x84 smte_start() at smte_start+0xbc ifq_serialize() at ifq_serialize+0x98 taskq_thread() at taskq_thread+0x70 proc_trampoline() at proc_trampoline+0xc end trace frame: 0x0, count: 6 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{2}> sh pool POOLt[0] == 0xffffffc07ab31670 t[1] == 0xffffffc000410f3c t[2] == 0xffffffc000a3897f t[3] == 0xffffffc000a35136 t[4] == 0x00000000778cf4a0 t[5] == 0x000000000000000c t[6] == 0x00000000778ceed8 s[0] == 0xffffffc07ab31510 s[1] == 0x0000000000000073 s[2] == 0xffffffffffffffff s[3] == 0x0000000000000000 s[4] == 0x0000000000000000 s[5] == 0x0000000000000000 s[6] == 0x80e7000010974601 s[7] == 0x0000000000000000 s[8] == 0x000000000000000a s[9] == 0xffffffc000871e42 s[10] == 0x0000000000000000 s[11] == 0x0000000000000005 a[0] == 0x80e7000010974601 a[1] == 0x0000000000000000 a[2] == 0x80e7000010974601 a[3] == 0x0000000000000000 a[4] == 0x0000000000000009 a[5] == 0x0000000000000001 a[6] == 0x0000000000000000 a[7] == 0x0000000000000000 sepc == 0xffffffc000640a3e sstatus == 0x0000000200000120 stval == 0x0000000010974601 scause == 0x000000000000000d panic: Fatal page fault at 0xffffffc000640a3e: 0x10974601 Stopped at panic+0xfc: addi a0,zero,256panic() at panic+0xfc do_trap_supervisor() at do_trap_supervisor+0x1f4 cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7e kprintf() at kprintf+0x73a db_printf() at db_printf+0x4e pool_print1() at pool_print1+0x48 db_command() at db_command+0x298 db_command_loop() at db_command_loop+0xda db_trap() at db_trap+0x122 kdb_trap() at kdb_trap+0xc6 db_trapper() at db_trapper+0x1e cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7e panic() at panic+0xfc do_trap_supervisor() at do_trap_supervisor+0x1f4 end trace frame: 0xffffffc07ab31b60, count: 0 ddb{2}> second host: t[0] == 0xffffffc023c43098 t[1] == 0x0000000000000002 t[2] == 0x0000000000000018 t[3] == 0x0000000000000002 t[4] == 0x000000003c726348 t[5] == 0x000000000000000c t[6] == 0x000000003c725d80 s[0] == 0xffffffc07ab31c90 s[1] == 0xffffffffffffffff s[2] == 0xffffffc0236b4d18 s[3] == 0x00000000000005c8 s[4] == 0xffffffc025101d80 s[5] == 0x00000000000005c8 s[6] == 0xffffffc025101d80 s[7] == 0xffffffc023c43068 s[8] == 0xffffffc000a35f08 s[9] == 0x0000000000000001 s[10] == 0x0000000000000001 s[11] == 0xffffffffffffffff a[0] == 0xffffffc025102000 a[1] == 0xffffffc025102380 a[2] == 0xffffffc000a34cba a[3] == 0x0000000000000040 a[4] == 0x0000000000000008 a[5] == 0xffffffc0006ee23c a[6] == 0xffffffc023c43068 a[7] == 0x0000000000000001 sepc == 0xffffffc0002b4cd4 sstatus == 0x0000000200000120 stval == 0xffffffc025102000 scause == 0x000000000000000f Stopped at panic+0xfc: addi a0,zero,256 TID PID UID PR FLAGS PFLAGS CPU COMMAND 513334 41107 55 0x1000002 0 3 moc 170039 40242 55 0x1000000 0 4 python3.13 310516 34628 55 0x1000000 0 5 python3.13 70844 58410 55 0x1000002 0 6 c++ 70810 25634 55 0x1000002 0 0 c++ 127486 15728 55 0x1000002 0 2 c++ 134565 79240 55 0x1000002 0 7 c++ *432997 30765 0 0x14000 0x200 1 softnet0 panic() at panic+0xfc do_trap_supervisor() at do_trap_supervisor+0x1f4 cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7e _dmamap_sync() at _dmamap_sync+0x13a smte_encap() at smte_encap+0x84 smte_start() at smte_start+0xbc ifq_serialize() at ifq_serialize+0x98 taskq_thread() at taskq_thread+0x70 proc_trampoline() at proc_trampoline+0xc end trace frame: 0x0, count: 6 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{1}> -- jca
