On Tue, Jan 07, 2025 at 06:21:50PM +0300, Vitaliy Makkoveev wrote: > > On 7 Jan 2025, at 17:25, Alexander Bluhm <[email protected]> wrote: > > > > Hi, > > > > My daily netlink test found a crash during socket splicing. > > > > [-- MARK -- Tue Jan 7 08:05:00 2025] > > uvm_fault(0xffffffff828c74e8, 0x7, 0, 2) -> e > > kernel: page fault trap, code=2 > > Stopped at taskq_next_work+0x8e: movq %rdx,0x8(%rsi) > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > *213124 16048 0 0x14000 0x200 3 sosplice > > 204927 99709 0 0x14000 0x200 0 softnet0 > > taskq_next_work(ffff800000078000,ffff8000359fc4c0) at taskq_next_work+0x8e > > taskq_thread(ffff800000078000) at taskq_thread+0x10b > > end trace frame: 0x0, count: 13 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb{3}> [-- MARK -- Tue Jan 7 08:10:00 2025] > > > > I have seen it once on real hardware andd once as KVM guest. It > > does not happen at the first test run, but after 4 to 8 runs it may > > crash. Affected versions are > > > > OpenBSD 7.6-current (GENERIC.MP) #498: Mon Jan 6 12:16:01 MST 2025 > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > OpenBSD 7.6-current (GENERIC.MP) #cvs : D2025.01.07.00.00.00: Tue Jan 7 > > 07:49:46 CET 2025 > > [email protected]:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > Smells like the socket reference counting problem. I made two socket related > diff last days. The first one which introduced new sorele() and converted > sosplice() task with timeout re-initialzation was committed 2024/12/30. The > second one which switched sosplice() to shared locks was committed at > 2025/01/04. > > What was the last stable build? Had you try to run sosplice test with my > last diff reverted?
The previous day it worked. But I am not sure how reliable the crash is. It takes several runs until it happens. I will try with reverted "Relax sockets splicing locking" commit. Maybe when I run specific tests many times I can make a reliable statement what triggered it. bluhm
