The riscv64 build cluster was running a kernel with the very latest
smte(4) commit.  Sadly today I found two hosts crashed at the same
location.  Looks like something that should reported as a fatal page
fault by panic() but it doesn't even make it there.  "show pool"
doesn't work either.

Not rebooting these two hosts as they need manual intervention because
of another issue, so their ddb is available as long as I don't ask
deraadt to kick them.


first host:
t[0] == 0xffffffc023cb9098
t[1] == 0x0000000000000002
t[2] == 0x0000000000000018
t[3] == 0x0000000000000002
t[4] == 0x00000000778cf4a0
t[5] == 0x000000000000000c
t[6] == 0x00000000778ceed8
s[0] == 0xffffffc07ab31c90
s[1] == 0xffffffffffffffff
s[2] == 0xffffffc0236b4d18
s[3] == 0x00000000000005c8
s[4] == 0xffffffc024c4bed8
s[5] == 0x00000000000005c8
s[6] == 0xffffffc024c4bed8
s[7] == 0xffffffc023cb9068
s[8] == 0xffffffc000a35f08
s[9] == 0x0000000000000001
s[10] == 0x0000000000000001
s[11] == 0xffffffffffffffff
a[0] == 0xffffffc024c4c000
a[1] == 0xffffffc024c4c4c0
a[2] == 0xffffffc000a34cba
a[3] == 0x0000000000000040
a[4] == 0x0000000000000008
a[5] == 0xffffffc0006ee23c
a[6] == 0xffffffc023cb9068
a[7] == 0x0000000000000001
sepc == 0xffffffc0002b4cd4
sstatus == 0x0000000200000120
stval == 0xffffffc024c4c000
scause == 0x000000000000000f
Stopped at      panic+0xfc:     addi    a0,zero,256    TID    PID    UID     PR
FLAGS     PFLAGS  CPU  COMMAND
 122455   5614     55   0x1000002          0    6  c++
 455499  39700     55   0x1000002          0    0  c++
 296634  22886     55   0x1000002          0    7  c++
  99173  60313     55   0x1000002          0    1  c++
 327995  89589     55   0x1000002          0    5  c++
 199455  66297     55   0x1000002          0    4  c++
 127821  19055     55   0x1000002          0    3  c++
*296440   9059      0     0x14000      0x200    2  softnet0
panic() at panic+0xfc
do_trap_supervisor() at do_trap_supervisor+0x1f4
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7e
_dmamap_sync() at _dmamap_sync+0x13a
smte_encap() at smte_encap+0x84
smte_start() at smte_start+0xbc
ifq_serialize() at ifq_serialize+0x98
taskq_thread() at taskq_thread+0x70
proc_trampoline() at proc_trampoline+0xc
end trace frame: 0x0, count: 6
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{2}> sh pool
POOLt[0] == 0xffffffc07ab31670
t[1] == 0xffffffc000410f3c
t[2] == 0xffffffc000a3897f
t[3] == 0xffffffc000a35136
t[4] == 0x00000000778cf4a0
t[5] == 0x000000000000000c
t[6] == 0x00000000778ceed8
s[0] == 0xffffffc07ab31510
s[1] == 0x0000000000000073
s[2] == 0xffffffffffffffff
s[3] == 0x0000000000000000
s[4] == 0x0000000000000000
s[5] == 0x0000000000000000
s[6] == 0x80e7000010974601
s[7] == 0x0000000000000000
s[8] == 0x000000000000000a
s[9] == 0xffffffc000871e42
s[10] == 0x0000000000000000
s[11] == 0x0000000000000005
a[0] == 0x80e7000010974601
a[1] == 0x0000000000000000
a[2] == 0x80e7000010974601
a[3] == 0x0000000000000000
a[4] == 0x0000000000000009
a[5] == 0x0000000000000001
a[6] == 0x0000000000000000
a[7] == 0x0000000000000000
sepc == 0xffffffc000640a3e
sstatus == 0x0000000200000120
stval == 0x0000000010974601
scause == 0x000000000000000d
 panic: Fatal page fault at 0xffffffc000640a3e: 0x10974601
Stopped at      panic+0xfc:     addi    a0,zero,256panic() at panic+0xfc
do_trap_supervisor() at do_trap_supervisor+0x1f4
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7e
kprintf() at kprintf+0x73a
db_printf() at db_printf+0x4e
pool_print1() at pool_print1+0x48
db_command() at db_command+0x298
db_command_loop() at db_command_loop+0xda
db_trap() at db_trap+0x122
kdb_trap() at kdb_trap+0xc6
db_trapper() at db_trapper+0x1e
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7e
panic() at panic+0xfc
do_trap_supervisor() at do_trap_supervisor+0x1f4
end trace frame: 0xffffffc07ab31b60, count: 0
ddb{2}>


second host:

t[0] == 0xffffffc023c43098
t[1] == 0x0000000000000002
t[2] == 0x0000000000000018
t[3] == 0x0000000000000002
t[4] == 0x000000003c726348
t[5] == 0x000000000000000c
t[6] == 0x000000003c725d80
s[0] == 0xffffffc07ab31c90
s[1] == 0xffffffffffffffff
s[2] == 0xffffffc0236b4d18
s[3] == 0x00000000000005c8
s[4] == 0xffffffc025101d80
s[5] == 0x00000000000005c8
s[6] == 0xffffffc025101d80
s[7] == 0xffffffc023c43068
s[8] == 0xffffffc000a35f08
s[9] == 0x0000000000000001
s[10] == 0x0000000000000001
s[11] == 0xffffffffffffffff
a[0] == 0xffffffc025102000
a[1] == 0xffffffc025102380
a[2] == 0xffffffc000a34cba
a[3] == 0x0000000000000040
a[4] == 0x0000000000000008
a[5] == 0xffffffc0006ee23c
a[6] == 0xffffffc023c43068
a[7] == 0x0000000000000001
sepc == 0xffffffc0002b4cd4
sstatus == 0x0000000200000120
stval == 0xffffffc025102000
scause == 0x000000000000000f
Stopped at      panic+0xfc:     addi    a0,zero,256    TID    PID    UID     PR
FLAGS     PFLAGS  CPU  COMMAND
 513334  41107     55   0x1000002          0    3  moc
 170039  40242     55   0x1000000          0    4  python3.13
 310516  34628     55   0x1000000          0    5  python3.13
  70844  58410     55   0x1000002          0    6  c++
  70810  25634     55   0x1000002          0    0  c++
 127486  15728     55   0x1000002          0    2  c++
 134565  79240     55   0x1000002          0    7  c++
*432997  30765      0     0x14000      0x200    1  softnet0
panic() at panic+0xfc
do_trap_supervisor() at do_trap_supervisor+0x1f4
cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7e
_dmamap_sync() at _dmamap_sync+0x13a
smte_encap() at smte_encap+0x84
smte_start() at smte_start+0xbc
ifq_serialize() at ifq_serialize+0x98
taskq_thread() at taskq_thread+0x70
proc_trampoline() at proc_trampoline+0xc
end trace frame: 0x0, count: 6
https://www.openbsd.org/ddb.html describes the minimum info required in bug
reports.  Insufficient info makes it difficult to find and fix bugs.
ddb{1}>


-- 
jca

Reply via email to