On 19/05/24(Sun) 23:50, Vitaliy Makkoveev wrote: > > > > On 19 May 2024, at 22:05, Anthony J. Bentley <bent...@openbsd.org> wrote: > > > > Vitaliy Makkoveev writes: > >>> On 17 May 2024, at 12:06, Stuart Henderson <s...@spacehopper.org> = > >> wrote: > >>> =20 > >>> There are problems with wg(4) that people with some workloads have = > >> been > >>> seeing after upgrading past 7.3, though looking at this thread from = > >> when > >>> it last came up https://marc.info/?t=3D170940892700001&r=3D1&w=3D2 I'm = > >> not > >>> sure if we'd be expecting to see trouble on non-MP=E2=80=A6 > >>> =20 > >> > >> We do. The problem is not MP related. > >> > >> Antony, does the diff [1] help? > >> > >> 1. https://marc.info/?l=3Dopenbsd-bugs&m=3D170980835807159&w=3D2 > > > > Crashes continue to occur with the same frequency after patching. > > > > This could be vio(4) bug. Please try this [1] diff. > > 1. https://marc.info/?l=openbsd-tech&m=171588941332420&w=2
The traces all point to a use-after-free in a mbuf that has been through the wg(4) machinery. The fact that using a SP system makes the crash disappear points that this driver is not MP-safe and somehow there is a race which ends up corrupting memory associated to mbufs. > > Here are three more crashes from running with the patch. I've seen > > identical traces with and without the patch but these were not in > > my last email. > > > > kernel: page fault trap, code=0 > > Stopped at schedclock+0x8a: movzbl 0x344(%rax),%r13d > > ddb> show panic > > the kernel did not panic > > ddb> trace > > schedclock(ffff8000fffeaa68) at schedclock+0x8a > > statclock(ffffffff82529bf8,ffff80001ca32a20,0) at statclock+0x129 > > clockintr_dispatch(ffff80001ca32a20) at clockintr_dispatch+0x30d > > clockintr(ffff80001ca32a20) at clockintr+0x59 > > intr_handler(ffff80001ca32a20,ffff8000000e6000) at intr_handler+0x3c > > Xintr_legacy0_untramp() at Xintr_legacy0_untramp+0x1a3 > > memset() at memset+0x5c > > end trace frame: 0x0, count: -7 > > ddb> ps > > PID TID PPID UID S FLAGS WAIT COMMAND > > > > > > panic: pr_find_pagehead: mbufpl: incorrect page > > Stopped at db_enter+0x14: popq %rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > db_enter() at db_enter+0x14 > > panic(ffffffff82161d70) at panic+0xb5 > > pool_do_put(ffffffff8260b3c0,fffffd8028dbf600) at pool_do_put+0x27a > > pool_put(ffffffff8260b3c0,fffffd8028dbf600) at pool_put+0x53 > > m_free(fffffd8028dbf600) at m_free+0xa6 > > m_freem(fffffd8028dbf600) at m_freem+0x38 > > vio_txeof(ffff800000064118) at vio_txeof+0x12d > > vio_tx_intr(ffff800000064118) at vio_tx_intr+0x31 > > virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102 > > virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65 > > intr_handler(ffff80001ca7e7f0,ffff800000073e00) at intr_handler+0x3c > > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3 > > memset() at memset+0x5c > > wg_encap_worker(ffff8000007ed000) at wg_encap_worker+0x79 > > end trace frame: 0xffff80001ca7e9f0, count: 0 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb> trace > > db_enter() at db_enter+0x14 > > panic(ffffffff82161d70) at panic+0xb5 > > pool_do_put(ffffffff8260b3c0,fffffd8028dbf600) at pool_do_put+0x27a > > pool_put(ffffffff8260b3c0,fffffd8028dbf600) at pool_put+0x53 > > m_free(fffffd8028dbf600) at m_free+0xa6 > > m_freem(fffffd8028dbf600) at m_freem+0x38 > > vio_txeof(ffff800000064118) at vio_txeof+0x12d > > vio_tx_intr(ffff800000064118) at vio_tx_intr+0x31 > > virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102 > > virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65 > > intr_handler(ffff80001ca7e7f0,ffff800000073e00) at intr_handler+0x3c > > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3 > > memset() at memset+0x5c > > wg_encap_worker(ffff8000007ed000) at wg_encap_worker+0x79 > > taskq_thread(ffff80000088ac00) at taskq_thread+0xf0 > > end trace frame: 0x0, count: -15 > > ddb> show panic > > *cpu0: pr_find_pagehead: mbufpl: incorrect page > > ddb> ps > > PID TID PPID UID S FLAGS WAIT COMMAND > > 56587 470184 85475 0 3 0x18000083 dtread btrace > > 58952 222967 0 89 3 0x19100092 kqread relayd > > 83190 101464 0 89 3 0x19100092 kqread relayd > > ddb> show registers > > rdi 0x4 > > rsi 0x14 > > rbp 0xffff80001ca7e4a0 > > rbx 0xfffffd8028dbf600 > > rdx 0x3fd > > rcx 0x4800000000000111 > > rax 0x30 > > r8 0x101010101010101 > > r9 0 > > r10 0x582c2a7821cc399f > > r11 0xf4834d1e02cdca10 > > r12 0xfffffd8028dbf600 > > r13 0xffff800000024800 > > r14 0 > > r15 0xffffffff82161d70 pp_r600_decoded_lanes+0xc8aa > > rip 0xffffffff81fa1d44 db_enter+0x14 > > cs 0x8 > > rflags 0x282 > > rsp 0xffff80001ca7e4a0 > > ss 0x10 > > db_enter+0x14: popq %rbp > > > > > > panic: pr_find_pagehead: mbufpl: incorrect page > > Stopped at db_enter+0x14: popq %rbp > > TID PID UID PRFLAGS PFLAGS CPU COMMAND > > *225925 73351 0 0x14000 0x200 0 wg_crypt > > db_enter() at db_enter+0x14 > > panic(ffffffff82161d70) at panic+0xb5 > > pool_do_put(ffffffff8260b3c0,fffffd8035fd9400) at pool_do_put+0x27a > > pool_put(ffffffff8260b3c0,fffffd8035fd9400) at pool_put+0x53 > > m_free(fffffd8035fd9400) at m_free+0xa6 > > m_freem(fffffd8035fd9400) at m_freem+0x38 > > vio_txeof(ffff800000064118) at vio_txeof+0x12d > > vio_tx_intr(ffff800000064118) at vio_tx_intr+0x31 > > virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102 > > virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65 > > intr_handler(ffff80001c922500,ffff800000073e00) at intr_handler+0x3c > > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3 > > memset() at memset+0x5c > > wg_encap_worker(ffff8000007ef000) at wg_encap_worker+0x79 > > end trace frame: 0xffff80001c922700, count: 0 > > https://www.openbsd.org/ddb.html describes the minimum info required in bug > > reports. Insufficient info makes it difficult to find and fix bugs. > > ddb> show panic > > *cpu0: pr_find_pagehead: mbufpl: incorrect page > > ddb> trace > > db_enter() at db_enter+0x14 > > panic(ffffffff82161d70) at panic+0xb5 > > pool_do_put(ffffffff8260b3c0,fffffd8035fd9400) at pool_do_put+0x27a > > pool_put(ffffffff8260b3c0,fffffd8035fd9400) at pool_put+0x53 > > m_free(fffffd8035fd9400) at m_free+0xa6 > > m_freem(fffffd8035fd9400) at m_freem+0x38 > > vio_txeof(ffff800000064118) at vio_txeof+0x12d > > vio_tx_intr(ffff800000064118) at vio_tx_intr+0x31 > > virtio_check_vqs(ffff800000024800) at virtio_check_vqs+0x102 > > virtio_pci_legacy_intr(ffff800000024800) at virtio_pci_legacy_intr+0x65 > > intr_handler(ffff80001c922500,ffff800000073e00) at intr_handler+0x3c > > Xintr_legacy5_untramp() at Xintr_legacy5_untramp+0x1a3 > > memset() at memset+0x5c > > wg_encap_worker(ffff8000007ef000) at wg_encap_worker+0x79 > > taskq_thread(ffff800000889080) at taskq_thread+0xf0 > > end trace frame: 0x0, count: -15 > > ddb> ps > > > > PID TID PPID UID S FLAGS WAIT COMMAND > > 51969 144614 37729 0 2 0x18000003 btrace > > 40841 474945 76353 1000 3 0x810008b sigsusp ksh > > 76353 455143 78366 1000 3 0x18000098 kqread sshd-session > > 78366 500790 60748 0 3 0x18000092 kqread sshd-session > > 1661 483333 93900 89 3 0x19100092 kqread relayd > > 20971 454162 93900 89 3 0x19100092 kqread relayd > > 66174 90602 93900 89 3 0x19100092 kqread relayd > > 48738 445549 93900 89 3 0x19100092 kqread relayd > > 88711 54303 93900 89 3 0x19100092 kqread relayd > > 33085 157864 93900 89 2 0x19100012 relayd > > 36613 263398 93900 89 3 0x19100092 relayd > > 93900 61929 1 0 3 0x18000080 kqread relayd > > 58569 410836 1 0 3 0x8100083 ttyin ksh > > 30102 428727 1 0 3 0x18100098 kqread cron > > *73351 225925 0 0 7 0x14200 wg_crypt > > 25707 237828 0 0 3 0x14200 bored wg_handshake > > 75251 422241 0 0 3 0x14200 bored wg_handshake > > 89402 219146 1 110 3 0x18100090 kqread sndiod > > 1652 116066 1 99 3 0x19100090 kqread sndiod > > 41636 131173 47944 95 3 0x19100092 kqread smtpd > > 56159 435661 47944 103 3 0x19100092 kqread smtpd > > 30864 263446 47944 95 3 0x18100092 kqread smtpd > > > > 64861 75991 47944 95 3 0x19100092 kqread smtpd > > > > 74399 157341 47944 95 3 0x19100092 kqread smtpd > > 47944 325461 1 0 3 0x18100080 kqread smtpd > > 60748 251840 1 0 3 0x18000088 kqread sshd > > 93282 26115 1 0 3 0x18100080 kqread ntpd > > 12262 492605 81276 83 3 0x18100092 kqread ntpd > > 81276 343918 1 83 2 0x19100492 ntpd > > 24416 419389 95291 74 3 0x19100092 bpf pflogd > > 95291 58348 1 0 3 0x18000080 sbwait pflogd > > 99456 71886 56811 73 3 0x19100090 kqread syslogd > > 57202 274926 82913 77 3 0x18100092 kqread dhcpleased > > 93609 415070 82913 77 3 0x18100092 kqread dhcpleased > > 82913 38615 1 0 3 0x18000080 kqread dhcpleased > > 39413 85502 22242 115 3 0x18100092 kqread slaacd > > 84235 356871 22242 115 3 0x18100092 kqread slaacd > > 22242 283359 1 0 3 0x18100080 kqread slaacd > > 53776 372278 0 0 3 0x14200 bored smr > > 16202 188026 0 0 3 0x14200 pgzero zerothread > > 40368 204141 0 0 3 0x14200 aiodoned aiodoned > > 18183 419428 0 0 3 0x14200 syncer update > > 79669 281449 0 0 3 0x14200 cleaner cleaner > > 80971 55573 0 0 3 0x14200 reaper reaper > > 88433 220842 0 0 3 0x14200 pgdaemon pagedaemon > > 34834 242944 0 0 3 0x14200 bored softnet3 > > 28119 493362 0 0 3 0x14200 bored softnet2 > > 41877 463150 0 0 3 0x14200 bored softnet1 > > 16167 354819 0 0 3 0x14200 bored softnet0 > > 93717 296304 0 0 3 0x14200 bored systqmp > > 45065 39416 0 0 3 0x14200 bored systq > > 46106 21722 0 0 3 0x40014200 tmoslp softclock > > 25869 146461 0 0 3 0x40014200 idle0 > > 1 357659 0 0 3 0x8000082 wait init > > 0 0 -1 0 3 0x10200 scheduler swapper > > ddb> show registers > > rdi 0x4 > > rsi 0x14 > > rbp 0xffff80001c9221b0 > > rbx 0xfffffd8035fd9400 > > rdx 0x3fd > > rcx 0x4800000000000111 > > rax 0x30 > > r8 0x101010101010101 > > r9 0 > > r10 0x8dd14be7a93050dc > > r11 0xe3e5f94705a0c9e7 > > r12 0xfffffd8035fd9400 > > r13 0xffff800000024800 > > r14 0 > > r15 0xffffffff82161d70 pp_r600_decoded_lanes+0xc8aa > > rip 0xffffffff81fa1d44 db_enter+0x14 > > cs 0x8 > > rflags 0x286 > > rsp 0xffff80001c9221b0 > > ss 0x10 > > db_enter+0x14: popq %rbp > > >