George, thank you for the suggestion of changing membar_enter and
membar_consumer
from isync to sync. I did that and the frequency of crashes went way
down, admittedly on
a workload that is not solidly reproducible. But last night there was
finally another crash (see below)
so that's not the full solution. I'll keep trying to read the code;
obviously nothing wrong so far
to my naive eye.
Martin, to respond to your question about pool corruption: yes, there
seems to be some
corruption or exhaustion of the pmap or pted pools but I don't see
evidence yet that it happens
in the same place or way each time.
panic: kernel diagnostic assertion "UVM_PSEG_INUSE(pseg, id)" failed: file "/sy
s/uvm/uvm_pager.c", line 227
Stopped at panic+0x134: ori r0,r0,0x0
TID PID UID PRFLAGS PFLAGS CPU COMMAND
210089 19233 8889 0x18000001 0 0 go.test
395921 69071 8889 0x1a000003 0 6 compile
118941 35800 8889 0x1a000003 0 3 compile
281308 32582 8889 0x1a000003 0 1 compile
*370626 8557 8889 0x1a000003 0x4000000 2 link
72 61744 8889 0x1a000003 0 4 compile
243588 29484 8889 0x1a000003 0 7 go.test
103299 39952 8889 0x1a000003 0 5 go
panic+0x134
__assert+0x30
uvm_pseg_release+0x380
uvn_io+0x2d4
uvn_get+0x1dc
uvm_fault_lower+0x22c
uvm_fault+0x200
trap+0x4a8
trapagain+0x4
--- trap (type 0x300) ---
End of kernel: 0xc0000273c8 lr 0x118168
https://www.openbsd.org/ddb.html describes the minimum info
required in bug
reports. Insufficient info makes it difficult to find and fix bugs.
ddb{2}> mach ddbcpu 0
Stopped at cpu_intr+0x50: ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
_kernel_lock+0xe0
xive_hvi+0x1a0
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
uvm_pmr_addr_RBT_COMPARE+0x28
uvm_pmr_pnaddr+0x70
uvm_pmr_insert_addr+0x78
uvm_pmr_remove_1strange+0x39c
ddb{0}> mach ddbcpu 1
Stopped at cpu_intr+0x50: ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
mtx_enter+0x5c
uvm_pmr_getpages+0x2a8
uvm_pglistalloc+0x11c
km_alloc+0x364
pool_page_alloc+0x64
pool_p_alloc+0x94
pool_do_get+0x298
pool_get+0xcc
pmap_enter+0x1ac
ddb{1}> mach ddbcpu 3
Stopped at cpu_intr+0x50: ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
mtx_enter+0x5c
uvm_pmr_freepageq+0xf0
uvm_pglistfree+0x28
km_alloc+0x3b8
pool_page_alloc+0x64
pool_p_alloc+0x94
pool_do_get+0x298
pool_get+0xcc
pmap_enter+0x1ac
ddb{3}> mach ddbcpu 4
Stopped at cpu_intr+0x50: ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
mtx_enter+0x5c
uvm_wait+0xbc
uvm_fault_lower+0x94c
uvm_fault+0x200
trap+0x270
trapagain+0x4
--- trap (type 0x400) ---
End of kernel: 0xc000037df8 lr 0x8c3420
ddb{4}> mach ddbcpu 5
Stopped at cpu_intr+0x50: ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
uvm_pmr_addr_RBT_COMPARE+0x28
uvm_pmr_pnaddr+0x70
uvm_pmr_insert_addr+0x78
uvm_pmr_remove_1strange+0x39c
uvm_pmr_freepageq+0x150
uvm_pglistfree+0x28
km_alloc+0x3b8
pool_page_alloc+0x64
pool_p_alloc+0x94
ddb{5}> mach ddbcpu 6
Stopped at cpu_intr+0x50: ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
mtx_enter+0x54
uvm_wait+0xbc
uvm_fault_lower+0x94c
uvm_fault+0x200
trap+0x270
trapagain+0x4
--- trap (type 0x400) ---
End of kernel: 0xc000037df8 lr 0x363d90
ddb{6}> mach ddbcpu 7
Stopped at cpu_intr+0x50: ori r0,r0,0x0
cpu_intr+0x50
xive_hvi+0x1b8
hvi_intr+0x38
trap+0xd4
trapagain+0x4
--- trap (type 0xea0) ---
mtx_enter+0x5c
uvm_pmr_getpages+0x2a8
uvm_pglistalloc+0x11c
km_alloc+0x364
pool_page_alloc+0x64
pool_p_alloc+0x94
pool_do_get+0x298
pool_get+0xcc
pmap_enter+0x1ac
ddb{7}>