24.01.2026 23:43, Klemens Nanni пишет:
> 20.01.2026 00:55, Klemens Nanni пишет:
>> 20.01.2026 00:29, Klemens Nanni пишет:
>>> 19.01.2026 22:55, Miod Vallat пишет:
>>>>> Nothing besides nd6 spam (about addresses of non-OpenBSD devices that
>>>>> work just fine):
>>>>>
>>>>> ddb{0}> dmesg
>>>>> <7>nd6_resolve: xxxx:xxxx:xxxx:xxxx:397f:4b51:7bcb:c6ff: incorrect nd6
>>>>> information
>>>>> ...
>>>>> Trap cause = 2 Frame 0x980000000fd97878
>>>>> Trap PC 0xffffffff8119dbdc RA 0xffffffff8119df2c fault 0x0
>>>>
>>>> This is a NULL pointer dereference happening at 0xffffffff8119dbdc. If
>>>> you x/i 0xffffffff8119dbdc this will show you where in cnmac_recv_mbuf
>>>> this happens, and then we can figure out the corresponding line in
>>>> if_cnmac.c.
>>>
>>> x/i gives the same address from my previous mail:
>>>
>>>>> Stopped at cnmac_recv_mbuf+0x134: ld v1,32(t8)
>>>
>>> I tried this:
>>>
>>> router# objdump -d /bsd | grep -m1 cnmac_recv_mbuf
>>> ffffffff8119daa8 <cnmac_recv_mbuf>:
>>> router# addr2line -e/bsd $(python3
>>> -c'print(hex(0xffffffff8119daa8+0x134))')
>>> ??:0
>>>
>>> Then against a fresh COPTS=-O0 DEBUG=-g kernel, but same result, also with:
>>>
>>> builder# egdb -q -batch -ex 'info line *cnmac_recv_mbuf+0x134' obj/bsd
>>>
>>> No line number information available for address 0xffffffff814954e4
>>> <cnmac_recv_mbuf+308>
>>
>>
>> tb@ pointed me at https://www.openbsd.org/ddb.html, but here on octeon
>> 'objdump -dlr obj/if_cnmac.o' does not yield line info and prints this:
>>
>> BFD: Dwarf Error: found dwarf version '0', this reader only handles version
>> 2 information.
>>
>> With llvm-objdump (thanks jca@) I do get this:
>>
>> ; /sys/arch/octeon/dev/if_cnmac.c:1146
>> 3aec: df 03 00 20 ld $3, 0x20($24)
>> 3af0: 14 43 00 30 bne $2, $3, 0x3bb4 <cnmac_recv_mbuf+0x1fc>
>> 3af4: 00 00 00 00 nop <cnmac_match>
>> 3af8: 7c 83 38 01 dext $3, $4, 0x0, 0x28 <cnmac_match+0x28>
>>
>>
>> 1139 for (i = 0; i < nbufs; i++) {
>> 1140 addr = word3 & PIP_WQE_WORD3_ADDR;
>> 1141 back = (word3 & PIP_WQE_WORD3_BACK) >>
>> PIP_WQE_WORD3_BACK_SHIFT;
>> 1142 pktbuf = (addr & ~(CACHELINESIZE - 1)) - back *
>> CACHELINESIZE;
>> 1143 pm = (struct mbuf **)PHYS_TO_XKPHYS(pktbuf,
>> CCA_CACHED) - 1;
>> 1144 m = *pm;
>> 1145 *pm = NULL;
>> 1146 if ((paddr_t)m->m_pkthdr.ph_cookie != pktbuf)
>> 1147 panic("%s: packet pool is corrupted, mbuf
>> cookie %p != "
>> 1148 "pktbuf %p", __func__,
>> m->m_pkthdr.ph_cookie,
>> 1149 (void *)pktbuf);
>> 1150
>>
>>
>> So m == NULL.
>
> Hit another one today running
> OpenBSD 7.8-current (GENERIC.MP) #124: Wed Jan 14 11:01:22 MST 2026
Same issue.
I then downgraded to 7.8-release and the system has been stable ever since,
uptime is 46 days and counting.
>
>
> Trap cause = 2 Frame 0x980000000fd83ac8
> Trap PC 0xffffffff8119c6c8 RA 0xffffffff8119c6c8 fault 0x0
> cnmac_send_queue_flush+0x90
> (c000000000028f38,6b268e5959439c77,705a061c1943f580,0) ra 0xffffffff8119badc
> sp 0x980000000fd83c20, sz 80
> cnmac_start+0x18c (c000000000028f38,fcd6d311cc225ea8,705a061c1943f580,0) ra
> 0xffffffff8146fdf4 sp 0x980000000fd83c70, sz 96
> ifq_start_task+0x5c (c000000000028f38,fcd6d311cc225ea8,705a061c1943f580,0)
> ra 0x0 sp 0x980000000fd83cd0, sz 0
> User-level: pid 17060
> stopped on non ddb fault
> Stopped at cnmac_send_queue_flush+0x90: ld v1,32(v0)
> ddb{3}> ddb{3}> cnmac_send_queue_flush+0x90
> (c000000000028f38,6b268e5959439c77,705a061c1943f580,0) ra 0xffffffff8119badc
> sp 0x980000000fd83c20, sz 80
> cnmac_start+0x18c (c000000000028f38,fcd6d311cc225ea8,705a061c1943f580,0) ra
> 0xffffffff8146fdf4 sp 0x980000000fd83c70, sz 96
> ifq_start_task+0x5c (c000000000028f38,fcd6d311cc225ea8,705a061c1943f580,0)
> ra 0x0 sp 0x980000000fd83cd0, sz 0
>
>
> # llvm-objdump -dlr obj/if_cnmac.o | tee dump | grep -F
> '<cnmac_send_queue_flush>'
> 0000000000002548 <cnmac_send_queue_flush>:
> # printf %x\\n $(( 0x2548 + 0x90 ))
> 25d8
> # awk '/^;/ { where = $2 } /25d8:/ { print where; exit(0) }' dump
> /sys/arch/octeon/dev/if_cnmac.c:597
>
>
> 536 void
> 537 cnmac_send_queue_flush(struct cnmac_softc *sc)
> 538 {
> 539 const int64_t sent_count = sc->sc_hard_done_cnt;
> 540 int i;
> 541
> 542 OCTEON_ETH_KASSERT(sent_count <= 0);
> 543
> 544 for (i = 0; i < 0 - sent_count; i++) {
> 545 struct mbuf *m;
> 546 uint64_t *gbuf;
> 547
> 548 cnmac_send_queue_del(sc, &m, &gbuf);
> 549
> 550 cn30xxfpa_buf_put_paddr(cnmac_fb_sg,
> XKPHYS_TO_PHYS(gbuf));
> 551
> 552 m_freem(m);
> 553 }
> 554
> 555 cn30xxfau_op_add_8(&sc->sc_fau_done, i);
> 556 }
>
> ...
>
> 588 void
> 589 cnmac_send_queue_del(struct cnmac_softc *sc, struct mbuf **rm,
> 590 uint64_t **rgbuf)
> 591 {
> 592 struct mbuf *m;
> 593 m = ml_dequeue(&sc->sc_sendq);
> 594 OCTEON_ETH_KASSERT(m != NULL);
> 595
> 596 *rm = m;
> 597 *rgbuf = m->m_pkthdr.ph_cookie;
> 598
> 599 if (m->m_ext.ext_free_fn != 0) {
> 600 sc->sc_ext_callback_cnt--;
> 601 OCTEON_ETH_KASSERT(sc->sc_ext_callback_cnt >= 0);
> 602 }
> 603 }
>
>
> Now running a newer snap, fwiw:
> OpenBSD 7.8-current (GENERIC.MP) #129: Thu Jan 22 09:49:17 MST 2026
>