Re: milk-v jupiter pool (pmap?) corruption

Mark Kettenis Wed, 22 Apr 2026 14:49:55 -0700

> Date: Wed, 22 Apr 2026 22:21:12 +0200
> From: Jeremie Courreges-Anglas <[email protected]>
> 
> On Tue, Apr 21, 2026 at 02:05:47PM +0200, Jeremie Courreges-Anglas wrote:
> > On Sat, Apr 18, 2026 at 03:00:57PM +0200, Mark Kettenis wrote:
> > > > Date: Sat, 18 Apr 2026 14:35:49 +0200
> > > > From: Jeremie Courreges-Anglas <[email protected]>
> > > > 
> > > > Hey hey,
> > > > 
> > > > First report below was obtained while running cvs co/up over NFS over
> > > > smte0, disk is nvme.  This is after the pmap_growkernel() fix.
> > > 
> > > The first report is *exactly* what the pmap_growkernel() fix fixes.
> > > So I suspect your kernel didn't have the fix yet.
> > 
> > That doesn't match my memories but let's hope you're right. ;)
> > 
> > > The second one doesn't look familliar, but the "locking against
> > > myself" panic is the result of an unexpected kernel page fault that
> > > occurs shortly after allocating some new KVA.
> > 
> > Here's a series of lockups running one or two cvs up processes over
> > NFS/udp.  Usually the machine is hanging, but in the first case it
> > hits faults in a loop.
> > 
> > The two only other diffs applied are your com diff, your dwpcie diff
> > and the diff below which pretty-prints code page faults.  (ok?)
> 
> Using the riscv64 bounce buffer + buffer flipper removal, I get
> crashes that seem to involve smte(4).


To be honest, I'm surprised that smte(4) works as well as it does.  We
may need some extra checks in smte_rx_proc() to make sure we did
receive a valid packet before sending down the network stack.

> login: t[0] == 0x8000000000000009
> t[1] == 0xffffffc000730132
> t[2] == 0x000000000000002b
> t[3] == 0x0000000f4b5d4768
> t[4] == 0x0000000f22674000
> t[5] == 0x00000000051eb850
> t[6] == 0x00000000028f5c28
> s[0] == 0xffffffc1262fbb70
> s[1] == 0x3d0a6e6f69746174
> s[2] == 0xffffffc023483c00
> s[3] == 0xffffffc1262fbc00
> s[4] == 0x0000000000000001
> s[5] == 0x0000000000000000
> s[6] == 0x0000000000000042
> s[7] == 0xffffffc0233b6880
> s[8] == 0x0000000000000084
> s[9] == 0x0000000000000002
> s[10] == 0x0000000000000000
> s[11] == 0xffffffc0009d9b14
> a[0] == 0x0000000000000001
> a[1] == 0x0000000000000042
> a[2] == 0xffffffc023483c00
> a[3] == 0x0000000000000001
> a[4] == 0xf548629906554a1b
> a[5] == 0xffffffc000254ca8
> a[6] == 0xffffffffffffffff
> a[7] == 0x0000000000000001
> sepc == 0xffffffc00072fb4a
> sstatus == 0x0000000200000120
> stval == 0x0000006f69746184
> scause == 0x000000000000000d
> panic: Fatal page fault at 0xffffffc00072fb4a: 0x6f69746184
> Stopped at      panic+0xfc:     addi    a0,zero,256    TID    PID    UID     
> PR
> FLAGS     PFLAGS  CPU  COMMAND
>   85875  99057   1000    0x100003          0    4  cvs
>  466143  32996   1000    0x100003          0    5  cvs
> panic() at panic+0xfc
> do_trap_supervisor() at do_trap_supervisor+0x1f4
> cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7a
> _bpf_mtap() at _bpf_mtap+0x54
> smte_rx_proc() at smte_rx_proc+0x104
> smte_intr() at smte_intr+0x2a
> plic_irq_dispatch() at plic_irq_dispatch+0xea
> plic_irq_handler() at plic_irq_handler+0x4e
> riscv_cpu_intr() at riscv_cpu_intr+0x2a
> cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7a
> sched_idle() at sched_idle+0x18c
> proc_trampoline() at proc_trampoline+0xc
> end trace frame: 0x0, count: 3
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{0}> show pool
> POOLt[0] == 0xffffffc1262fb4b0
> t[1] == 0xffffffc0002460d0
> t[2] == 0xffffffc000a9fdf1
> t[3] == 0xffffffc000a34f0e
> t[4] == 0x0000000f22674000
> t[5] == 0x00000000051eb850
> t[6] == 0x00000000028f5c28
> s[0] == 0xffffffc1262fb350
> s[1] == 0x0000000000000073
> s[2] == 0xffffffffffffffff
> s[3] == 0x0000000000000000
> s[4] == 0x0000000000000000
> s[5] == 0x0000000000000000
> s[6] == 0x80e7000010974601
> s[7] == 0x0000000000000000
> s[8] == 0x000000000000000a
> s[9] == 0xffffffc000873ba3
> s[10] == 0x0000000000000000
> s[11] == 0x0000000000000005
> a[0] == 0x80e7000010974601
> a[1] == 0x0000000000000000
> a[2] == 0x80e7000010974601
> a[3] == 0x0000000000000000
> a[4] == 0x0000000000000009
> a[5] == 0x0000000000000001
> a[6] == 0x0000000000000000
> a[7] == 0x0000000000000000
> sepc == 0xffffffc00042028c
> sstatus == 0x0000000200000120
> stval == 0x0000000010974601
> scause == 0x000000000000000d
>  panic: Fatal page fault at 0xffffffc00042028c: 0x10974601
> Stopped at      panic+0xfc:     addi    a0,zero,256panic() at panic+0xfc
> do_trap_supervisor() at do_trap_supervisor+0x1f4
> cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7a
> kprintf() at kprintf+0x73a
> db_printf() at db_printf+0x4e
> pool_print1() at pool_print1+0x48
> db_command() at db_command+0x298
> db_command_loop() at db_command_loop+0xda
> db_trap() at db_trap+0x122
> kdb_trap() at kdb_trap+0xc6
> db_trapper() at db_trapper+0x1e
> cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7a
> panic() at panic+0xfc
> do_trap_supervisor() at do_trap_supervisor+0x1f4
> end trace frame: 0xffffffc1262fb9a0, count: 0
> ddb{0}> bo re
> rebooting...
> 
> 
> OpenBSD/riscv64 (jupiter.leard.wxcvbn.org) (console)
> 
> login: 35/0007.: trying to send packet on wrong domain. if 775434288 vs. mbuf > 0
> panic: kernel diagnostic assertion "idx < SMTE_NRXDESC" failed: file 
> "/usr/src/
> sys/arch/riscv64/dev/if_smte.c", line 831
> Stopped at      panic+0xfc:     addi    a0,zero,256    TID    PID    UID     
> PR
> FLAGS     PFLAGS  CPU  COMMAND
> panic() at panic+0xfc
> panic() at panic
> smte_rx_proc() at smte_rx_proc+0x17a
> smte_intr() at smte_intr+0x2a
> plic_irq_dispatch() at plic_irq_dispatch+0xea
> plic_irq_handler() at plic_irq_handler+0x4e
> riscv_cpu_intr() at riscv_cpu_intr+0x2a
> cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7a
> sched_idle() at sched_idle+0x18c
> proc_trampoline() at proc_trampoline+0xc
> end trace frame: 0x0, count: 5
> https://www.openbsd.org/ddb.html describes the minimum info required in bug
> reports.  Insufficient info makes it difficult to find and fix bugs.
> ddb{0}> show pool
> POOLt[0] == 0xffffffc1262fb750
> t[1] == 0xffffffc0002460d0
> t[2] == 0xffffffc000a9fdf1
> t[3] == 0xffffffc000a34f0e
> t[4] == 0x0000000007920000
> t[5] == 0x000000000000000c
> t[6] == 0x000000000791f000
> s[0] == 0xffffffc1262fb5f0
> s[1] == 0x0000000000000073
> s[2] == 0xffffffffffffffff
> s[3] == 0x0000000000000000
> s[4] == 0x0000000000000000
> s[5] == 0x0000000000000000
> s[6] == 0x80e7000010974601
> s[7] == 0x0000000000000000
> s[8] == 0x000000000000000a
> s[9] == 0xffffffc000873ba3
> s[10] == 0x0000000000000000
> s[11] == 0x0000000000000005
> a[0] == 0x80e7000010974601
> a[1] == 0x0000000000000000
> a[2] == 0x80e7000010974601
> a[3] == 0x0000000000000000
> a[4] == 0x0000000000000009
> a[5] == 0x0000000000000001
> a[6] == 0x0000000000000000
> a[7] == 0x0000000000000000
> sepc == 0xffffffc00042028c
> sstatus == 0x0000000200000120
> stval == 0x0000000010974601
> scause == 0x000000000000000d
>  panic: Fatal page fault at 0xffffffc00042028c: 0x10974601
> Stopped at      panic+0xfc:     addi    a0,zero,256panic() at panic+0xfc
> do_trap_supervisor() at do_trap_supervisor+0x1f4
> cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7a
> kprintf() at kprintf+0x73a
> db_printf() at db_printf+0x4e
> pool_print1() at pool_print1+0x48
> db_command() at db_command+0x298
> db_command_loop() at db_command_loop+0xda
> db_trap() at db_trap+0x122
> kdb_trap() at kdb_trap+0xc6
> db_trapper() at db_trapper+0x1e
> cpu_exception_handler_supervisor() at cpu_exception_handler_supervisor+0x7a
> panic() at panic+0xfc
> panic() at panic
> end trace frame: 0xffffffc1262fbc00, count: 0
> ddb{0}> bo re
> rebooting...
> 
> 
> 
> -- 
> jca
>

Re: milk-v jupiter pool (pmap?) corruption

Reply via email to