On Sat, Nov 16, 2024 at 07:36:37PM -0800, Andrew Hewus Fresh wrote: > I finally got around to fixing my alpha, which involved replacing the > disk. That meant I have to scp some stuff over to to it and after a bit > of time it panics: > > panic: mtx 0xfffffe000002a628: locking against myself > Stopped at db_enter+0x8: lda sp,10(sp) > TID PID UID PRFLAGS PFLAGS CPU COMMAND > *204699 79774 0 0x14000 0x200 0 softnet0 > db_enter(0, 7ffffe00e0003f8, 1, 8, 3, 8) at db_enter+0x8 > panic(?, fffffe000002a628, 1b0, 10, a, 1) at panic+0xe8 > mtx_enter(?, ?, 1b0, 10, a, 1) at mtx_enter+0xb4 > ifq_set_oactive(?, ?, 1b0, 10, a, 1) at ifq_set_oactive+0x50
this is from src/sys/net/ifq.c r1.50 where i added a counter for the number of times oactive gets set. because there's checks and multiple things being tweaked i used the ifq mutex to serialise the updates. de(4) uses ifq_deq_begin to try and shove an mbuf onto the hardware, which takes but doesnt release the ifq mutex until ifq_deq_commit or ifq_deq_rollback is called. so while it's holding the mutex is calls ifq_set_oactive, which also tries to take the mutex. i honestly don't understand what de(4) is doing with the hardware and packet setup, so i dont feel confident changing the driver to avoid this. the least worst alternative i could think of is to provide an alternative set_oactive it can call. the diff below should fix this. > I reinstalled back to 7.6 from the November 13 snapshot I tried first > with no change. I can apparently reproduce at will, but the machine is > pretty slow so diagnostics will be slow. Both dmesg are below. > > Is this something known or should I try gather more details? > (if so, anything in particular?) > > > [ using 1157232 bytes of bsd ELF symbol table ] > Copyright (c) 1982, 1986, 1989, 1991, 1993 > The Regents of the University of California. All rights reserved. > Copyright (c) 1995-2024 OpenBSD. All rights reserved. https://www.OpenBSD.org > > OpenBSD 7.6-current (GENERIC) #460: Wed Nov 13 17:53:07 MST 2024 > [email protected]:/usr/src/sys/arch/alpha/compile/GENERIC > AlphaStation 200 4/166, 166MHz > 8192 byte page size, 1 processor. > real mem = 167772160 (160MB) > rsvd mem = 2048000 (1MB) > avail mem = 152584192 (145MB) > random: good seed from bootblocks > mainbus0 at root > cpu0 at mainbus0: ID 0 (primary), 21064-0 (pass 2 or 2.1) > apecs0 at mainbus0: DECchip 21071 Core Logic chipset > apecs0: DC21071-CA pass 2, 64-bit memory bus > apecs0: DC21071-DA pass 2 > pci0 at apecs0 bus 0 > siop0 at pci0 dev 6 function 0 "Symbios Logic 53c810" rev 0x02: isa irq 11 > scsibus0 at siop0: 8 targets, initiator 7 > sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST336753LC, 0006> > serial.SEAGATE_ST336753LC_3HX1QFE000007422AWD1 > sd0: 35003MB, 512 bytes/sector, 71687372 sectors > sio0 at pci0 dev 7 function 0 "Intel 82378IB ISA" rev 0x03 > de0 at pci0 dev 11 function 0 "DEC 21040" rev 0x23, DEC 21040 pass 2.3: isa > irq 5, address 08:00:2b:e4:f4:33 > tga0 at pci0 dev 13 function 0 "DEC 21030" rev 0x02: DC21030 step B, board > type T8-02 > tga0: 1024 x 768, 8bpp, Bt485 RAMDAC > tga0: interrupting at isa irq 10 > wsdisplay0 at tga0 mux 1 > wsdisplay0: screen 0 added (std, vt100 emulation) > isa0 at sio0 > isadma0 at isa0 > fdc0 at isa0 port 0x3f0/6 irq 6 drq 2 > com0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo > com0: console > com1 at isa0 port 0x2f8/8 irq 3: ns16550a, 16 byte fifo > pckbc0 at isa0 port 0x60/5 irq 1 irq 12 > pcppi0 at isa0 port 0x61 > spkr0 at pcppi0 > lpt0 at isa0 port 0x3bc/4 irq 7 > mcclock0 at isa0 port 0x70/2: mc146818 or compatible > stray isa irq 3 > vscsi0 at root > scsibus1 at vscsi0: 256 targets > softraid0 at root > scsibus2 at softraid0: 256 targets > siop0: target 0 now using tagged 8 bit 10.0 MHz 8 REQ/ACK offset xfers > root on sd0a (0c5ed4a2cd41ff27.a) swap on sd0b dump on sd0b > fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec > stray isa irq 3 > panic: mtx 0xfffffe000002a628: locking against myself > Stopped at db_enter+0x8: lda sp,10(sp) > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > *434339 44111 0 0x14000 0x200 0 softnet0 > > db_enter(0, 7ffffe00e0003f8, 1, 8, 3, 8) at db_enter+0x8 > panic(?, fffffe000002a628, 1c0, 10, a, 1) at panic+0xe8 > mtx_enter(?, ?, 1c0, 10, a, 1) at mtx_enter+0xb4 > ifq_set_oactive(?, ?, 1c0, 10, a, 1) at ifq_set_oactive+0x50 > https://www.openbsd.org/ddb.html describes the minimum info required in bug > reports. Insufficient info makes it difficult to find and fix bugs. > ddb> ddb> ddb> ddb> *cpu0: mtx 0xfffffe000002a628: locking against myself > ddb> syncing disks...3 2 done > WARNING: not updating battery clock > rebooting... Index: net/ifq.c =================================================================== RCS file: /cvs/src/sys/net/ifq.c,v diff -u -p -r1.54 ifq.c --- net/ifq.c 9 Nov 2024 04:09:56 -0000 1.54 +++ net/ifq.c 17 Nov 2024 06:04:59 -0000 @@ -156,6 +156,17 @@ ifq_set_oactive(struct ifqueue *ifq) } void +ifq_deq_set_oactive(struct ifqueue *ifq) +{ + MUTEX_ASSERT_LOCKED(&ifq->ifq_mtx); + + if (!ifq->ifq_oactive) { + ifq->ifq_oactive = 1; + ifq->ifq_oactives++; + } +} + +void ifq_restart_task(void *p) { struct ifqueue *ifq = p; Index: net/ifq.h =================================================================== RCS file: /cvs/src/sys/net/ifq.h,v diff -u -p -r1.41 ifq.h --- net/ifq.h 10 Nov 2023 15:51:24 -0000 1.41 +++ net/ifq.h 17 Nov 2024 06:04:59 -0000 @@ -444,6 +444,7 @@ void ifq_q_leave(struct ifqueue *, voi void ifq_serialize(struct ifqueue *, struct task *); void ifq_barrier(struct ifqueue *); void ifq_set_oactive(struct ifqueue *); +void ifq_deq_set_oactive(struct ifqueue *); int ifq_deq_sleep(struct ifqueue *, struct mbuf **, int, int, const char *, volatile unsigned int *, Index: dev/pci/if_de.c =================================================================== RCS file: /cvs/src/sys/dev/pci/if_de.c,v diff -u -p -r1.143 if_de.c --- dev/pci/if_de.c 24 May 2024 06:02:53 -0000 1.143 +++ dev/pci/if_de.c 17 Nov 2024 06:04:59 -0000 @@ -3897,7 +3897,7 @@ tulip_txput(tulip_softc_t * const sc, st if (sc->tulip_flags & TULIP_TXPROBE_ACTIVE) { TULIP_CSR_WRITE(sc, csr_txpoll, 1); - ifq_set_oactive(&sc->tulip_if.if_snd); + ifq_deq_set_oactive(&sc->tulip_if.if_snd); TULIP_PERFEND(txput); return (NULL); } @@ -3926,7 +3926,7 @@ tulip_txput(tulip_softc_t * const sc, st sc->tulip_dbg.dbg_txput_finishes[6]++; #endif if (sc->tulip_flags & (TULIP_WANTTXSTART|TULIP_DOINGSETUP)) { - ifq_set_oactive(&sc->tulip_if.if_snd); + ifq_deq_set_oactive(&sc->tulip_if.if_snd); if ((sc->tulip_intrmask & TULIP_STS_TXINTR) == 0) { sc->tulip_intrmask |= TULIP_STS_TXINTR; TULIP_CSR_WRITE(sc, csr_intr, sc->tulip_intrmask);
