bgpctl show mrt file - Segmentation fault (core dumped)
Hi, Here at Srce we are running OpenBSD 7.5-release as route server. I wanted to collect some additional MRT data and I have this in bgpd.conf dump table-v2 "/data/bgpdumps/bgp-rib-dump-%y_%m_%d-%H_%M" 300 dump all out "/data/bgpdumps/bgp-all-out-%y_%m_%d-%H_%M" 300 dump all in "/data/bgpdumps/bgp-all-in-%y_%m_%d-%H_%M" 300 If I want to read bgp-rib-dump with bgpctl show mrt file /data/bgpdumps/bgp-rib-dump-24_06_03-10_46 everything seems fine But if I want to read bgp-all-in or bgp-all-out I get Segmentation fault (core dumped) rs1# gdb bgpctl bgpctl.core GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-unknown-openbsd7.5"...(no debugging symbols found) Core was generated by `bgpctl'. Program terminated with signal 11, Segmentation fault. (no debugging symbols found) Loaded symbols for /usr/sbin/bgpctl Reading symbols from /usr/lib/libutil.so.18.0...done. Loaded symbols for /usr/lib/libutil.so.18.0 Reading symbols from /usr/lib/libm.so.10.1...done. Loaded symbols for /usr/lib/libm.so.10.1 Reading symbols from /usr/lib/libc.so.99.0...done. Loaded symbols for /usr/lib/libc.so.99.0 Reading symbols from /usr/libexec/ld.so...Error while reading shared library symbols: Dwarf Error: wrong version in compilation unit header (is 4, should be 2) [in module /usr/libexec/ld.so] #0 ibuf_get_n8 (buf=0x751ce250b5d0, value=0x751ce250b6dd "u") at /usr/src/lib/libutil/imsg-buffer.c:412 412 /usr/src/lib/libutil/imsg-buffer.c: No such file or directory. in /usr/src/lib/libutil/imsg-buffer.c All those three files I can normally read with bgpdump.
panic when forwarding high amount of traffic over mcx - kernel diagnostic assertion "((flags & PGO_LOCKED)
Hi all, in lab I have 2 socket box with lot of interfaces, ix, ixl, mcx, bnxt, em and bge. When sending high traffic over mcx whole machine is almost unresponsive, like sending any command over console. In that state pagedaemon is at 100% sometimes ever higher and mcl12k Fail counter is rising. In sysctl.conf there is kern.maxclusters=1048576 and NET_TASKQ=16 in if.c While sending traffic over ix or ixl in the same machine everything seems fine. bnxt is fishy and for some other bug report :) I saw this mcx behavior before mpi@ diff "Add per-CPU caches to the pmemrange allocator" but didn't manage to trigger panic. I seems that this mcx behaviour is only under high traffic besause I have few mcx in producion at they behaves excelent In the attachment you can find ddb output Just one question, is it possible to put in ddb something like mach all ddbcpu or mach ddbcpu all ? :) When having 32 or more cores and converting decimal to hex cpu number, one can easely make mistake dmesg OpenBSD 7.5-current (GENERIC.MP) #2: Sat Jun 1 22:36:05 CEST 2024 hrvoje@bigi.netlab:/sys/arch/amd64/compile/GENERIC.MP real mem = 410826829824 (391794MB) avail mem = 398354190336 (379900MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.2 @ 0x68e36000 (80 entries) bios0: vendor Dell Inc. version "2.21.2" date 02/19/2024 bios0: Dell Inc. PowerEdge R740xd efi0 at bios0: UEFI 2.7 efi0: Dell Inc. rev 0x2150201 acpi0 at bios0: ACPI 6.1 acpi0: sleep states S0 S5 acpi0: tables DSDT FACP SSDT MCEJ WD__ SLIC HPET APIC MCFG MIGT MSCT PCAT PCCT RASF SLIT SRAT SVOS WSMT OEM4 SSDT SSDT SSDT SPCR DMAR HEST BERT ERST EINJ acpi0: wakeup devices XHCI(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) PXSX(S4) RP20(S4) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 2399 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.55 MHz, 06-55-04, patch 02007108 cpu0: cpuid 1 edx=bfebfbff ecx=77fefbff cpu0: cpuid 6 eax=77 ecx=9 cpu0: cpuid 7.0 ebx=d39b ecx=8 edx=bc002400 cpu0: cpuid a vers=4, gp=8, gpwidth=48, ff=3, ffwidth=48 cpu0: cpuid d.1 eax=f cpu0: cpuid 8001 edx=2c100800 ecx=121 cpu0: cpuid 8007 edx=100 cpu0: msr 10a=2000c04 cpu0: MELTDOWN cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 16-way L2 cache, 22MB 64b/line 11-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 24MHz cpu0: mwait min=64, max=64, C-substates=0.2.0.2, IBE cpu1 at mainbus0: apid 32 (application processor) cpu1: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2791.39 MHz, 06-55-04, patch 02007108 cpu1: smt 0, core 0, package 1 cpu2 at mainbus0: apid 14 (application processor) cpu2: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.65 MHz, 06-55-04, patch 02007108 cpu2: smt 0, core 7, package 0 cpu3 at mainbus0: apid 46 (application processor) cpu3: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.76 MHz, 06-55-04, patch 02007108 cpu3: smt 0, core 7, package 1 cpu4 at mainbus0: apid 2 (application processor) cpu4: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.59 MHz, 06-55-04, patch 02007108 cpu4: smt 0, core 1, package 0 cpu5 at mainbus0: apid 34 (application processor) cpu5: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2794.11 MHz, 06-55-04, patch 02007108 cpu5: smt 0, core 1, package 1 cpu6 at mainbus0: apid 12 (application processor) cpu6: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.88 MHz, 06-55-04, patch 02007108 cpu6: smt 0, core 6, package 0 cpu7 at mainbus0: apid 44 (application processor) cpu7: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.84 MHz, 06-55-04, patch 02007108 cpu7: smt 0, core 6, package 1 cpu8 at mainbus0: apid 4 (application processor) cpu8: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2794.95 MHz, 06-55-04, patch 02007108 cpu8: smt 0, core 2, package 0 cpu9 at mainbus0: apid 36 (application processor) cpu9: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2794.84 MHz, 06-55-04, patch 02007108 cpu9: smt 0, core 2, package 1 cpu10 at mainbus0: apid 10 (application processor) cpu10: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2795.34 MHz, 06-55-04, patch 02007108 cpu10: smt 0, core 5, package 0 cpu11 at mainbus0: apid 42 (application processor) cpu11: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.85 MHz, 06-55-04, patch 02007108 cpu11: smt 0, core 5, package 1 cpu12 at mainbus0: apid 6 (application processor) cpu12: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2795.03 MHz, 06-55-04, patch 02007108 cpu12: smt 0, core 3, package 0 cpu13 at mainbus0: apid 38 (application processor) cpu13: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2794.46 MHz, 06-55-04, patch 02007108 cpu13: smt 0, core 3, package 1 cpu14 at mainbus0: apid 8 (application processor) cpu14: Intel(R) Xeon(R) Gold 6130
Dell HBA330 problem
Hi all, I have Dell R740xd with HBA330 non-raid as disk controler. It seems that after doing little more intensive disk work over HBA330 like local cvs checkout, box freezes with lots of logs mpii0: mpii_scsi_cmd_tmo (0x4000265d) After that nothing can be done except power cycle box from ipmi console or via power button, reboot does not work from console. I've compiled kernel with MPII_DEBUG and here are last few logs before box freezes Mar 20 19:53:11 r740xd /bsd: mpii0: mpii_read 0 0x4000265d Mar 20 19:53:11 r740xd /bsd: mpii0: mpii_scsi_cmd_tmo (0x4000265d) Mar 20 19:53:11 r740xd /bsd: mpii0: mpii_get_ccb 0x818fe490 Mar 20 19:53:11 r740xd /bsd: mpii0: mpii_start 0x2c87c00 Mar 20 19:53:11 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_LOW (0x00c0) write 0x00760006 Mar 20 19:53:11 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_HIGH (0x00c4) write 0x0286 Mar 20 19:53:11 r740xd /bsd: mpii0: mpii_read 0 0x4000265d Mar 20 19:53:11 r740xd /bsd: mpii0: mpii_scsi_cmd_tmo (0x4000265d) Mar 20 19:53:11 r740xd /bsd: mpii0: mpii_get_ccb 0x818fe580 Mar 20 19:53:11 r740xd /bsd: mpii0: mpii_start 0x2c88200 Mar 20 19:53:11 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_LOW (0x00c0) write 0x00790006 Mar 20 19:53:11 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_HIGH (0x00c4) write 0x0286 Mar 20 19:53:18 r740xd /bsd: mpii0: mpii_read 0 0x4000265d Mar 20 19:53:18 r740xd /bsd: mpii0: mpii_scsi_cmd_tmo (0x4000265d) Mar 20 19:53:18 r740xd /bsd: mpii0: mpii_get_ccb 0x818fe710 Mar 20 19:53:18 r740xd /bsd: mpii0: mpii_start 0x2c88c00 Mar 20 19:53:18 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_LOW (0x00c0) write 0x007e0006 Mar 20 19:53:18 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_HIGH (0x00c4) write 0x0286 Mar 20 19:58:19 r740xd /bsd: mpii0: mpii_get_ccb 0x818fe350 Mar 20 19:58:19 r740xd /bsd: mpii0: mpii_scsi_cmd Mar 20 19:58:19 r740xd /bsd: mpii0: ccb_smid: 114 xs->flags: 0x1001 Mar 20 19:58:19 r740xd /bsd: mpii0: mpii_start 0x2c87400 Mar 20 19:58:19 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_LOW (0x00c0) write 0xa0072 Mar 20 19:58:19 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_HIGH (0x00c4) write 0x0286 Mar 20 19:59:19 r740xd /bsd: mpii0: mpii_read 0 0x4000265d Mar 20 19:59:19 r740xd /bsd: mpii0: mpii_scsi_cmd_tmo (0x4000265d) Mar 20 19:59:19 r740xd /bsd: mpii0: mpii_get_ccb 0x818fe670 Mar 20 19:59:19 r740xd /bsd: mpii0: mpii_start 0x2c88800 Mar 20 19:59:19 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_LOW (0x00c0) write 0x007c0006 Mar 20 19:59:19 r740xd /bsd: mpii0: MPII_REQ_DESCR_POST_HIGH (0x00c4) write 0x0286 If someone is willing to look at this problem I will gladly give access to this box. dmesg without MPII_DEBUG OpenBSD 7.5 (GENERIC.MP) #78: Sun Mar 17 21:55:24 MDT 2024 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 410826829824 (391794MB) avail mem = 398354268160 (379900MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.2 @ 0x68e36000 (80 entries) bios0: vendor Dell Inc. version "2.21.2" date 02/19/2024 bios0: Dell Inc. PowerEdge R740xd efi0 at bios0: UEFI 2.7 efi0: Dell Inc. rev 0x2150201 acpi0 at bios0: ACPI 6.1 acpi0: sleep states S0 S5 acpi0: tables DSDT FACP SSDT MCEJ WD__ SLIC HPET APIC MCFG MIGT MSCT PCAT PCCT RASF SLIT SRAT SVOS WSMT OEM4 SSDT SSDT SSDT SPCR DMAR HEST BERT ERST EINJ acpi0: wakeup devices XHCI(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) PXSX(S4) RP20(S4) PXSX(S4) RP01(S4) PXSX(S4) RP02(S4) PXSX(S4) RP03(S4) PXSX(S4) RP04(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 2399 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.56 MHz, 06-55-04, patch 02007108 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,AVX512CD,AVX512BW,AVX512VL,PKU,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,RSBA,MISC_PKG_CT,ENERGY_FILT,GDS_CTRL,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 16-way L2 cache, 22MB 64b/line 11-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 24MHz cpu0: mwait min=64, max=64, C-substates=0.2.0.2, IBE cpu1 at mainbus0: apid 32 (application processor) cpu1: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, 2793.60 MHz, 06-55-04, patch 02007108 cpu1:
Re: HA IPSec with AWS - no second flow
On 11.3.2024. 10:22, Rafał Ramocki wrote: > Hello, > > > Hello, I'm not sure if I'm doing something wrong or if is it a common > problem. I have iked.conf set up in the following way: > > ikev2 active from 10.2.15.0/24 to 172.31.0.0/20 from 10.2.15.0/24 to > 172.31.16.0/20 from 10.2.15.0/24 to 172.31.32.0/20 from 169.254.74.238 to > 169.254.74.237 local X.X.X.X peer 16.170.59.81 ikesa auth hmac-sha2-256 enc > aes-256 prf hmac-sha2-256 group modp4096 childsa auth hmac-sha2-256 enc > aes-256 group modp4096 srcid X.X.X.X ikelifetime 28800 lifetime 3600 psk > '_REMOVED_' > > ikev2 active from 10.2.15.0/24 to 172.31.0.0/20 from 10.2.15.0/24 to > 172.31.16.0/20 from 10.2.15.0/24 to 172.31.32.0/20 from 169.254.21.38 to > 169.254.21.37 local X.X.X.X peer 51.21.86.8 ikesa auth hmac-sha2-256 enc > aes-256 prf hmac-sha2-256 group modp4096 childsa auth hmac-sha2-256 enc > aes-256 group modp4096 srcid X.X.X.X ikelifetime 28800 lifetime 3600 psk > '_REMOVED_' > > > Both tunnels are up from AWS perspective. Both tunnels have SAD's: > > # ipsecctl -ss > esp tunnel from 51.21.86.8 to X.X.X.X spi 0x02c0ae3a auth hmac-sha2-256 enc > aes-256 > esp tunnel from 16.170.59.81 to X.X.X.X spi 0x09ef0398 auth hmac-sha2-256 enc > aes-256 > esp tunnel from 16.170.59.81 to X.X.X.X spi 0x324ceca5 auth hmac-sha2-256 enc > aes-256 > esp tunnel from 51.21.86.8 to X.X.X.X spi 0xa9672a52 auth hmac-sha2-256 enc > aes-256 > esp tunnel from X.X.X.X to 16.170.59.81 spi 0xc08c4de5 auth hmac-sha2-256 enc > aes-256 > esp tunnel from X.X.X.X to 16.170.59.81 spi 0xc2e0efe9 auth hmac-sha2-256 enc > aes-256 > esp tunnel from X.X.X.X to 51.21.86.8 spi 0xc3e8a0e0 auth hmac-sha2-256 enc > aes-256 > esp tunnel from X.X.X.X to 51.21.86.8 spi 0xccb3250e auth hmac-sha2-256 enc > aes-256 > > > But flows with overlapped from-to pair are set only for one of the tunnels: > > # ipsecctl -sf > flow esp in from 169.254.21.37 to 169.254.21.38 peer 51.21.86.8 srcid > IPV4/X.X.X.X dstid IPV4/51.21.86.8 type require > flow esp in from 169.254.74.237 to 169.254.74.238 peer 16.170.59.81 srcid > IPV4/X.X.X.X dstid IPV4/16.170.59.81 type require > flow esp in from 172.31.0.0/20 to 10.2.15.0/24 peer 51.21.86.8 srcid > IPV4/X.X.X.X dstid IPV4/51.21.86.8 type require > flow esp in from 172.31.16.0/20 to 10.2.15.0/24 peer 51.21.86.8 srcid > IPV4/X.X.X.X dstid IPV4/51.21.86.8 type require > flow esp in from 172.31.32.0/20 to 10.2.15.0/24 peer 51.21.86.8 srcid > IPV4/X.X.X>X dstid IPV4/51.21.86.8 type require > > flow esp out from 10.2.15.0/24 to 172.31.0.0/20 peer 51.21.86.8 srcid > IPV4/X.X.X.X dstid IPV4/51.21.86.8 type require > flow esp out from 10.2.15.0/24 to 172.31.16.0/20 peer 51.21.86.8 srcid > IPV4/X.X.X.X dstid IPV4/51.21.86.8 type require > flow esp out from 10.2.15.0/24 to 172.31.32.0/20 peer 51.21.86.8 srcid > IPV4/X.X.X.X dstid IPV4/51.21.86.8 type require > flow esp out from 169.254.21.38 to 169.254.21.37 peer 51.21.86.8 srcid > IPV4/X.X.X.X dstid IPV4/51.21.86.8 type require > flow esp out from 169.254.74.238 to 169.254.74.237 peer 16.170.59.81 srcid > IPV4/X.X.X.X dstid IPV4/16.170.59.81 type require > > I think IKED may detect that flow is already set for this from-to pair and is > not setting up additional one but it should take also remote endpoint into > account as those are different. Having no flow set up is resulting in that, > when some data are received on that second tunnel that have no flows set, > those data are discarded and not forwarded any more propably due to RPF > policy. > > I tried to figure out how those are set up by code analysys but I think it > may be beyond my capabilitys as I'm only a sysadmin not a developer. > > OpenBSD version: 7.3 > > best regards > Rafal Ramocki > Hi, I think that you can't have two same ipsec tunnels with policy based vpns in OpenBSD, but you can do something like this https://www.linuxquestions.org/questions/blog/rocket357-328529/openbsd-etc-ipsec-conf-for-aws-vpc-vpn-36423/ Good thing is that OpenBSD from 7.4 supports route based ipsec tunnels https://www.undeadly.org/cgi?action=article;sid=20230704094238
Re: TSO em(4) problem
On 1.2.2024. 18:42, Alexander Bluhm wrote: > On Tue, Jan 30, 2024 at 02:32:24PM +0100, Hrvoje Popovski wrote: >> yes, and forwarding only without pf. >> I'm sending traffic from host connected to vlan/ix0 and forward through >> em5 to other host. >> I'm sending 1Gbps of traffic with cisco t-rex > I cannot reproduce. > > ix0 at pci6 dev 0 function 0 "Intel 82599" rev 0x01, msix, 8 queues, address > 90:e2:ba:d6:23:68 > em1 at pci7 dev 0 function 1 "Intel I350" rev 0x01: msi, address > a0:36:9f:0a:4a:c5 > > root@ot42:.../~# ifconfig ix0 hwfeatures > ix0: flags=2008843 mtu 1500 > > hwfeatures=71b7 > hardmtu 9198 > lladdr 90:e2:ba:d6:23:68 > description: Intel 82599 > index 5 priority 0 llprio 3 > media: Ethernet autoselect (10GSFP+Cu full-duplex,rxpause,txpause) > status: active > > root@ot42:.../~# ifconfig em1 hwfeatures > em1: flags=8c43 mtu 1500 > > hwfeatures=31b7 > hardmtu 9216 > lladdr a0:36:9f:0a:4a:c5 > description: Intel I350 > index 8 priority 0 llprio 3 > media: Ethernet autoselect (1000baseT full-duplex,master) > status: active > inet 10.10.22.3 netmask 0xff00 broadcast 10.10.22.255 > > root@ot42:.../~# ifconfig vlan0 hwfeatures > vlan0: flags=8843 mtu 1500 > > hwfeatures=3187 > hardmtu 9198 > lladdr 90:e2:ba:d6:23:68 > index 24 priority 0 llprio 3 > encap: vnetid 221 parent ix0 txprio packet rxprio outer > groups: vlan > media: Ethernet autoselect (10GSFP+Cu full-duplex,rxpause,txpause) > status: active > inet 10.10.21.2 netmask 0xff00 broadcast 10.10.21.255 > > root@ot42:.../~# pfctl -si > Status: Disabled for 0 days 00:03:42 Debug: err > > Running tcpbench -n100 from Linux via OpenBSD forwarding to Linux. > Simultaneous udpbench to create traffic mixture. > > root@ot42:.../~# netstat -ss | egrep 'TSO|LRO' > 1188 output TSO packets software chopped > 33086906 output TSO packets hardware processed > 265855748 output TSO packets generated > 31090975 input LRO generated packets from hardware > 176482178 input LRO coalesced packets by network device > > Lot of LRO and TSO. Running diff below, which reverts em TSO backout > and adds sparc64 fix. > > Hrvoje: What is different in your lab? I think I found it. It's lldp. If I enable lldpd I'm getting watchdog on em, when disabled only one watchdog at the beginning of testing.
Re: TSO em(4) problem
On 30.1.2024. 13:33, Alexander Bluhm wrote: > On Tue, Jan 30, 2024 at 12:07:08PM +0100, Hrvoje Popovski wrote: >> On 30.1.2024. 9:27, Hrvoje Popovski wrote: >>> I will prepare one box for this kind of traffic and will contact you and >>> marcus >>> >>>> In theory when going through vlan interface it should remove >>>> M_VLANTAG. But something must be wrong and I wonder what. >>>> >>>> bluhm >> >> Hi, >> >> I've managed to trigger watchdog in lab. It couldn't be possible without >> bluhm@ information about ix vlan, thank you. > > Great, now we can debug the details. > > I have to know how ix and em are connected. > > Do you have any bridge or veb? Where are your vlan trunks? > Any aggr, trunk, carp? no, only vlan on ix0. > Is my understanding of your setup corect? > > ix -> vlan -> forward -> em yes, and forwarding only without pf. I'm sending traffic from host connected to vlan/ix0 and forward through em5 to other host. I'm sending 1Gbps of traffic with cisco t-rex > Can something more happen, like > > ix -> forward -> em > In setup without vlan on ix I've got only one watchdog at the begging of testing and that's it. With vlan I'm getting around 6 or 7 watchdogs per minute which means 6 or 7 links going up/down. without vlan smc4# netstat -sp tcp | grep TSO 0 output TSO packets software chopped 268 output TSO packets hardware processed 0 output TSO packets generated 0 output TSO packets dropped smc4# netstat -sp tcp | grep LRO 0 input LRO packets passed through pseudo device 7666573 input LRO generated packets from hardware 21667579 input LRO coalesced packets by network device 0 input bad LRO packets dropped
Re: TSO em(4) problem
On 30.1.2024. 9:27, Hrvoje Popovski wrote: > I will prepare one box for this kind of traffic and will contact you and > marcus > >> In theory when going through vlan interface it should remove >> M_VLANTAG. But something must be wrong and I wonder what. >> >> bluhm Hi, I've managed to trigger watchdog in lab. It couldn't be possible without bluhm@ information about ix vlan, thank you. Jan 30 12:01:09 smc4 /bsd: em5: watchdog: head 123 tail 187 TDH 187 TDT 123 Jan 30 12:01:18 smc4 /bsd: em5: watchdog: head 243 tail 307 TDH 307 TDT 243 Jan 30 12:01:28 smc4 /bsd: em5: watchdog: head 463 tail 15 TDH 15 TDT 463 Jan 30 12:01:37 smc4 /bsd: em5: watchdog: head 413 tail 477 TDH 477 TDT 413 Jan 30 12:01:46 smc4 /bsd: em5: watchdog: head 195 tail 259 TDH 259 TDT 195 Jan 30 12:01:55 smc4 /bsd: em5: watchdog: head 259 tail 323 TDH 323 TDT 259 Jan 30 12:02:05 smc4 /bsd: em5: watchdog: head 333 tail 397 TDH 397 TDT 333 Jan 30 12:02:14 smc4 /bsd: em5: watchdog: head 33 tail 97 TDH 97 TDT 33 Jan 30 12:02:24 smc4 /bsd: em5: watchdog: head 459 tail 11 TDH 11 TDT 459 Jan 30 12:02:33 smc4 /bsd: em5: watchdog: head 447 tail 511 TDH 511 TDT 447 em0 at pci7 dev 0 function 0 "Intel 82576" rev 0x01: msi, address 00:1b:21:61:8a:94 em1 at pci7 dev 0 function 1 "Intel 82576" rev 0x01: msi, address 00:1b:21:61:8a:95 em2 at pci8 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:25:90:5d:c9:98 em3 at pci9 dev 0 function 0 "Intel I210" rev 0x03: msi, address 00:25:90:5d:c9:99 em4 at pci12 dev 0 function 0 "Intel I350" rev 0x01: msi, address 00:25:90:5d:c9:9a em5 at pci12 dev 0 function 1 "Intel I350" rev 0x01: msi, address 00:25:90:5d:c9:9b em6 at pci12 dev 0 function 2 "Intel I350" rev 0x01: msi, address 00:25:90:5d:c9:9c em7 at pci12 dev 0 function 3 "Intel I350" rev 0x01: msi, address 00:25:90:5d:c9:9d smc4# netstat -sp tcp | grep LRO 0 input LRO packets passed through pseudo device 4696315 input LRO generated packets from hardware 13205047 input LRO coalesced packets by network device 0 input bad LRO packets dropped smc4# netstat -sp tcp | grep TSO 0 output TSO packets software chopped 3672 output TSO packets hardware processed 0 output TSO packets generated 0 output TSO packets dropped smc4# ifconfig em5 hwfeatures em5: flags=8c43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 00:25:90:5d:c9:9b index 8 priority 0 llprio 3 media: Ethernet autoselect (1000baseT full-duplex,master,rxpause,txpause) status: active inet 192.168.20.1 netmask 0xff00 broadcast 192.168.20.255
Re: TSO em(4) problem
On 29.1.2024. 15:29, Alexander Bluhm wrote: > On Sat, Jan 27, 2024 at 08:08:35AM +0100, Hrvoje Popovski wrote: >> On 26.1.2024. 22:47, Alexander Bluhm wrote: >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: >>>> I've manage to reproduce TSO em problem on anoter setup, unfortunatly >>>> production. >>> What helped debugging a similar issue with ixl(4) and TSO was to >>> remove all TSO specific code from the driver. Then only this part >>> remains from the original em(4) TSO diff. >>> >>> error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE, >>> EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1), >>> EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); >>> >>> The parameters that changed when adding TSO are: >>> >>> bus_size_t size:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535 >>> bus_size_t maxsegsz:MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE >>> 4096 >>> >>> I suspect that this is the cause for the regression as disabling >>> TSO did not help. Would it be possible to run the diff below? I >>> expect that the problem will still be there. But then we know it >>> must be the change of one of the bus_dmamap_create() arguments. >>> >>> bluhm >> >> Hi, >> >> with this diff em0 seems happy and em watchdog is gone. > > This is very interesting. That means that the bus_dmamap_create() > argument does not cause the regression. > > Did you see anywhere "output TSO packets hardware processed in" > netstat -s. In some iteration of testing you turned TSO off with > sysctl net.inet.tcp.tso=0, but it did not help. So no TSO packets > from the stack. > > In another mail you mentioned > >> Setup is very simple >> em0 - carp <- uplink >> em1 - pfsync >> ix1 - vlans - carp > > ix supports LRO. If you forward from ix1 to em0 the LRO packets > from ix hardware are split by TSO on em hardware. And the ix does > vlan offloading + LRO, so em must do vlan offloading properly with > TSO. Or do you use a vlan interface? > > Does it help to disable LRO, ifconfig ix1 -tcplro ? Yes, it helps... Thank you uplink em0: flags=8b43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5a index 3 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,master,rxpause) status: active vlans are on ix1 - I've disabled LRO ix1: flags=8b43 mtu 1500 lladdr 90:e2:ba:d7:1b:f5 index 2 priority 0 llprio 3 media: Ethernet autoselect (10GbaseSR full-duplex,rxpause,txpause) status: active before I've disabled LRO on ix1 I've got lot of watchdog on em0 bcbnfw1# uptime 9:25AM up 8 mins, 1 user, load averages: 0.14, 0.13, 0.06 bcbnfw1# cat /var/log/messages| grep watchdog Jan 30 09:18:51 bcbnfw1 /bsd: em0: watchdog: head 148 tail 213 TDH 213 TDT 148 Jan 30 09:19:01 bcbnfw1 /bsd: em0: watchdog: head 160 tail 224 TDH 224 TDT 160 Jan 30 09:19:12 bcbnfw1 /bsd: em0: watchdog: head 163 tail 228 TDH 228 TDT 163 Jan 30 09:19:22 bcbnfw1 /bsd: em0: watchdog: head 128 tail 192 TDH 192 TDT 128 Jan 30 09:19:32 bcbnfw1 /bsd: em0: watchdog: head 309 tail 373 TDH 373 TDT 309 Jan 30 09:19:41 bcbnfw1 /bsd: em0: watchdog: head 113 tail 177 TDH 177 TDT 113 Jan 30 09:19:51 bcbnfw1 /bsd: em0: watchdog: head 402 tail 466 TDH 466 TDT 402 Jan 30 09:20:01 bcbnfw1 /bsd: em0: watchdog: head 114 tail 178 TDH 178 TDT 114 Jan 30 09:20:16 bcbnfw1 /bsd: em0: watchdog: head 111 tail 175 TDH 175 TDT 111 Jan 30 09:20:26 bcbnfw1 /bsd: em0: watchdog: head 199 tail 263 TDH 263 TDT 199 without LRO on ix1 everything seems to work just fine ... > > I see this vlan code with mac_type checks. Can we end in a > configuration where we enable TSO but cannot do VLAN offloading? > > #if NVLAN > 0 > /* Find out if we are in VLAN mode */ > if (m->m_flags & M_VLANTAG && (sc->hw.mac_type < em_82575 || > sc->hw.mac_type > em_i210)) { > /* Set the VLAN id */ > desc->upper.fields.special = htole16(m->m_pkthdr.ether_vtag); > > /* Tell hardware to add tag */ > desc->lower.data |= htole32(E1000_TXD_CMD_VLE); > } > #endif > > Hrvoje, I know you do great tests in your lab. Did you try this > setup: > > Send bulk TCP traffic in vlan that will trigger LRO. > Do VLAN + LRO offloading in ix. > Forward it to em with TSO. > I will prepare one box for this kind of traffic and will contact you and marcus > In theory when going through vlan interface it should remove > M_VLANTAG. But something must be wrong and I wonder what. > > bluhm >
Re: TSO em(4) problem
On 28.1.2024. 10:44, Marcus Glocker wrote: > On Sun, Jan 28, 2024 at 12:16:20AM +0100, Hrvoje Popovski wrote: > >> On 27.1.2024. 21:01, Marcus Glocker wrote: >>> On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: >>> >>>> On 26.1.2024. 21:56, Marcus Glocker wrote: >>>>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: >>>>> >>>>>> I've manage to reproduce TSO em problem on anoter setup, unfortunatly >>>>>> production. >>>>>> >>>>>> Setup is very simple >>>>>> >>>>>> em0 - carp <- uplink >>>>>> em1 - pfsync >>>>>> ix1 - vlans - carp >>>>> Would it be possible that you also share an "ifconfig -a hwfeatures" of >>>>> that box? You can mask the IPs if it's too sensitive. >>>>> >>>>> I still try to reproduce the issue here, and for now I can't. >>>>> Maybe in your full ifconfig output I can see some specifics about your >>>>> configuration, which makes it more likely to reproduce the issue here. >>>>> >>>> Hi, >>>> >>>> here's ifconfig from second setup where watchdog is triggered much faster. >>>> Originally in this setup uplink is ix0, I've change that to em0 to see >>>> would the problem be same as in other setup and it is, and that's good >>>> because this is pfsync setup for students and I can do whatever I want >>>> with it :) >>> Thanks. >>> >>> But still, I can do whatever I want on my em(4) I210 box, carp(4), >>> vlan(4), creating a lot of traffic, I can't reproduce the watchdog which >>> you are seeing :-( I'm not sure if this is something related to your >>> I350. >>> >>> Also, I can't understand why the watchdog still triggers when you disable >>> TSO by setting net.inet.tcp.tso=0. >>> >>> Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, >>> while EM_TSO_SIZE (65535) is one byte less, can you please apply this >>> diff to -current and test it? I doubt it will make a difference, but >>> I'm running a bit out of ideas here. >> >> >> Hi, >> >> with this diff I'm still getting em watchdog >> >> Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185 >> TDT 120 > > Thanks for testing again. > > I think we might have a generic problem with TSO with the current em(4) > code and some chips. Referring to this recent FreeBSD commit. > > e1000: disable TSO on lem(4) and em(4): > Disable TSO on lem(4) and em(4) until a ring stall can be debugged. > https://github.com/freebsd/freebsd-src/commit/797e480cba8834e584062092c098e60956d28180 > > Can you try this diff to specifically disable TSO for I350 please? > > We will need to discuss internally which way to go. I see those > options currently: > > - Entirely pull out the TSO diff. > - Leave the TSO code in but disable TSO for now (what FreeBSD did). > - Leave the TSO code in but disable TSO only for chips we see issues > with (this diff). > Hi, with this diff I still see TSOv4 and TSOv6 on i350 is this ok ? em0 watchgod is triggered with or without net.inet.tcp.tso=1/0 em0: flags=8b43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5a index 3 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,master,rxpause) status: active em0 at pci7 dev 0 function 0 "Intel I350" rev 0x01: msi, address Jan 28 13:18:45 bcbnfw1 /bsd: em0: watchdog: head 89 tail 153 TDH 153 TDT 89 Jan 28 13:41:19 bcbnfw1 /bsd: em0: watchdog: head 336 tail 400 TDH 400 TDT 336 Jan 28 13:58:13 bcbnfw1 /bsd: em0: watchdog: head 172 tail 236 TDH 236 TDT 172 > > Index: if_em.c > === > RCS file: /cvs/src/sys/dev/pci/if_em.c,v > diff -u -p -u -p -r1.370 if_em.c > --- if_em.c 31 Dec 2023 08:42:33 - 1.370 > +++ if_em.c 28 Jan 2024 09:30:59 - > @@ -2013,7 +2013,9 @@ em_setup_interface(struct em_softc *sc) > if (sc->hw.mac_type >= em_82575 && sc->hw.mac_type <= em_i210) { > ifp->if_capabilities |= IFCAP_CSUM_IPv4; > ifp->if_capabilities |= IFCAP_CSUM_TCPv6 | IFCAP_CSUM_UDPv6; > - ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; > + /* XXX: Enabling TSO on I350 causes watchdogs */ > + if (sc->hw.mac_type != em_i350) > + ifp->if_capabilities |= IFCAP_TSOv4 | IFCAP_TSOv6; > } > > /* >
Re: TSO em(4) problem
On 27.1.2024. 21:01, Marcus Glocker wrote: > On Sat, Jan 27, 2024 at 08:01:09AM +0100, Hrvoje Popovski wrote: > >> On 26.1.2024. 21:56, Marcus Glocker wrote: >>> On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: >>> >>>> I've manage to reproduce TSO em problem on anoter setup, unfortunatly >>>> production. >>>> >>>> Setup is very simple >>>> >>>> em0 - carp <- uplink >>>> em1 - pfsync >>>> ix1 - vlans - carp >>> Would it be possible that you also share an "ifconfig -a hwfeatures" of >>> that box? You can mask the IPs if it's too sensitive. >>> >>> I still try to reproduce the issue here, and for now I can't. >>> Maybe in your full ifconfig output I can see some specifics about your >>> configuration, which makes it more likely to reproduce the issue here. >>> >> Hi, >> >> here's ifconfig from second setup where watchdog is triggered much faster. >> Originally in this setup uplink is ix0, I've change that to em0 to see >> would the problem be same as in other setup and it is, and that's good >> because this is pfsync setup for students and I can do whatever I want >> with it :) > Thanks. > > But still, I can do whatever I want on my em(4) I210 box, carp(4), > vlan(4), creating a lot of traffic, I can't reproduce the watchdog which > you are seeing :-( I'm not sure if this is something related to your > I350. > > Also, I can't understand why the watchdog still triggers when you disable > TSO by setting net.inet.tcp.tso=0. > > Just to rule out that you're receiving a MAXMCLBYTES (65536) packet, > while EM_TSO_SIZE (65535) is one byte less, can you please apply this > diff to -current and test it? I doubt it will make a difference, but > I'm running a bit out of ideas here. Hi, with this diff I'm still getting em watchdog Jan 28 00:14:12 bcbnfw1 /bsd: em0: watchdog: head 120 tail 185 TDH 185 TDT 120
Re: TSO em(4) problem
On 26.1.2024. 22:47, Alexander Bluhm wrote: > On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: >> I've manage to reproduce TSO em problem on anoter setup, unfortunatly >> production. > What helped debugging a similar issue with ixl(4) and TSO was to > remove all TSO specific code from the driver. Then only this part > remains from the original em(4) TSO diff. > > error = bus_dmamap_create(sc->sc_dmat, EM_TSO_SIZE, > EM_MAX_SCATTER / (sc->pcix_82544 ? 2 : 1), > EM_TSO_SEG_SIZE, 0, BUS_DMA_NOWAIT, >pkt_map); > > The parameters that changed when adding TSO are: > > bus_size_t size: MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SIZE 65535 > bus_size_t maxsegsz: MAX_JUMBO_FRAME_SIZE 16128 -> EM_TSO_SEG_SIZE 4096 > > I suspect that this is the cause for the regression as disabling > TSO did not help. Would it be possible to run the diff below? I > expect that the problem will still be there. But then we know it > must be the change of one of the bus_dmamap_create() arguments. > > bluhm Hi, with this diff em0 seems happy and em watchdog is gone. bcbnfw1# uptime 8:06AM up 44 mins, 2 users, load averages: 0.00, 0.00, 0.00 bcbnfw1# ifconfig em0 hwfeatures em0: flags=8b43 mtu 1500 hwfeatures=1b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5a index 3 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,master,rxpause) status: active inet 10.10.155.234 netmask 0xfff8 broadcast 10.10.155.239 This morning without diff bcbnfw1# cat /var/log/messages | grep watchdog Jan 27 07:12:03 bcbnfw1 /bsd: em0: watchdog: head 50 tail 114 TDH 114 TDT 50 Jan 27 07:15:29 bcbnfw1 /bsd: em0: watchdog: head 370 tail 434 TDH 434 TDT 370 Jan 27 07:15:43 bcbnfw1 /bsd: em0: watchdog: head 219 tail 283 TDH 283 TDT 219 Jan 27 07:15:54 bcbnfw1 /bsd: em0: watchdog: head 322 tail 386 TDH 386 TDT 322 Jan 27 07:16:08 bcbnfw1 /bsd: em0: watchdog: head 115 tail 179 TDH 179 TDT 115 Jan 27 07:16:21 bcbnfw1 /bsd: em0: watchdog: head 364 tail 428 TDH 428 TDT 364 Jan 27 07:16:35 bcbnfw1 /bsd: em0: watchdog: head 473 tail 26 TDH 26 TDT 473
Re: TSO em(4) problem
On 26.1.2024. 21:56, Marcus Glocker wrote: > On Fri, Jan 26, 2024 at 11:41:49AM +0100, Hrvoje Popovski wrote: > >> I've manage to reproduce TSO em problem on anoter setup, unfortunatly >> production. >> >> Setup is very simple >> >> em0 - carp <- uplink >> em1 - pfsync >> ix1 - vlans - carp > > Would it be possible that you also share an "ifconfig -a hwfeatures" of > that box? You can mask the IPs if it's too sensitive. > > I still try to reproduce the issue here, and for now I can't. > Maybe in your full ifconfig output I can see some specifics about your > configuration, which makes it more likely to reproduce the issue here. > Hi, here's ifconfig from second setup where watchdog is triggered much faster. Originally in this setup uplink is ix0, I've change that to em0 to see would the problem be same as in other setup and it is, and that's good because this is pfsync setup for students and I can do whatever I want with it :) bcbnfw1# ifconfig -a hwfeatures lo0: flags=2008049 mtu 32768 hwfeatures=7187 index 6 priority 0 llprio 3 groups: lo inet 127.0.0.1 netmask 0xff00 ix0: flags=2008802 mtu 1500 hwfeatures=71b7 hardmtu 9198 lladdr 90:e2:ba:d7:1b:f4 index 1 priority 0 llprio 3 media: Ethernet autoselect (10GbaseSR full-duplex) status: active ix1: flags=2008b43 mtu 1500 hwfeatures=71b7 hardmtu 9198 lladdr 90:e2:ba:d7:1b:f5 index 2 priority 0 llprio 3 media: Ethernet autoselect (10GbaseSR full-duplex,rxpause,txpause) status: active em0: flags=8b43 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5a index 3 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex,rxpause) status: active inet 10.10.155.234 netmask 0xfff8 broadcast 10.10.155.239 em1: flags=8843 mtu 1500 hwfeatures=31b7 hardmtu 9216 lladdr 0c:c4:7a:da:cd:5b index 4 priority 0 llprio 3 media: Ethernet autoselect (1000baseT full-duplex,rxpause,txpause) status: active inet 192.168.0.77 netmask 0xfffc broadcast 192.168.0.79 enc0: flags=0<> hwfeatures=0<> index 5 priority 0 llprio 3 groups: enc status: active carp0: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:01 index 7 priority 15 llprio 3 carp: MASTER carpdev em0 vhid 1 advbase 1 advskew 10 groups: carp status: master inet 10.10.155.236 netmask 0x carp1100: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:12 index 8 priority 15 llprio 3 carp: MASTER carpdev vlan1100 vhid 18 advbase 1 advskew 10 groups: carp status: master inet 10.30.16.1 netmask 0x carp1101: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:16 index 9 priority 15 llprio 3 carp: MASTER carpdev vlan1101 vhid 22 advbase 1 advskew 10 groups: carp status: master inet 10.31.16.1 netmask 0x carp1102: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:19 index 10 priority 15 llprio 3 carp: MASTER carpdev vlan1102 vhid 25 advbase 1 advskew 10 groups: carp status: master inet 10.32.16.1 netmask 0x carp1103: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:1c index 11 priority 15 llprio 3 carp: MASTER carpdev vlan1103 vhid 28 advbase 1 advskew 10 groups: carp status: master inet 10.33.16.1 netmask 0x carp1130: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:13 index 12 priority 15 llprio 3 carp: MASTER carpdev vlan1130 vhid 19 advbase 1 advskew 10 groups: carp status: master inet 10.30.0.1 netmask 0x carp1131: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:17 index 13 priority 15 llprio 3 carp: MASTER carpdev vlan1131 vhid 23 advbase 1 advskew 10 groups: carp status: master inet 10.31.0.1 netmask 0x carp1132: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:1a index 14 priority 15 llprio 3 carp: MASTER carpdev vlan1132 vhid 26 advbase 1 advskew 10 groups: carp status: master inet 10.32.0.1 netmask 0x carp1133: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500 lladdr 00:00:5e:00:01:1d index 15 priority 15 llprio 3 carp: MASTER carpdev vlan1133 vhid 29 advbase 1 advskew 10 groups: carp status: master inet 10.33.0.1 netmask 0x carp1150: flags=8843 mtu 1500 hwfeatures=3187 hardmtu 1500
Re: TSO em(4) problem
ev 2.00/1.00 addr 1 pcib0 at pci1 dev 31 function 0 "Intel C610 LPC" rev 0x05 ahci1 at pci1 dev 31 function 2 "Intel C610 AHCI" rev 0x05: msi, AHCI 1.3 ahci1: port 0: 6.0Gb/s ahci1: port 1: 6.0Gb/s scsibus2 at ahci1: 32 targets sd0 at scsibus2 targ 0 lun 0: naa.5002538d417c7a2b sd0: 244198MB, 512 bytes/sector, 500118192 sectors, thin sd1 at scsibus2 targ 1 lun 0: naa.5002538d417cc12c sd1: 244198MB, 512 bytes/sector, 500118192 sectors, thin ichiic0 at pci1 dev 31 function 3 "Intel C610 SMBus" rev 0x05: apic 1 int 18 iic0 at ichiic0 isa0 at pcib0 isadma0 at isa0 pcppi0 at isa0 port 0x61 spkr0 at pcppi0 vmm0 at mainbus0: VMX/EPT uhub3 at uhub0 port 12 configuration 1 interface 0 "ATEN International product 0x7000" rev 2.00/0.00 addr 2 uhidev0 at uhub3 port 1 configuration 1 interface 0 "ATEN International product 0x2419" rev 1.10/1.00 addr 3 uhidev0: iclass 3/1 ukbd0 at uhidev0: 8 variable keys, 6 key codes wskbd0 at ukbd0 mux 1 uhidev1 at uhub3 port 1 configuration 1 interface 1 "ATEN International product 0x2419" rev 1.10/1.00 addr 3 uhidev1: iclass 3/1 ums0 at uhidev1: 3 buttons, Z dir wsmouse0 at ums0 mux 0 uhub4 at uhub1 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.05 addr 2 uhub5 at uhub2 port 1 configuration 1 interface 0 "Intel Rate Matching Hub" rev 2.00/0.05 addr 2 vscsi0 at root scsibus3 at vscsi0: 256 targets softraid0 at root scsibus4 at softraid0: 256 targets root on sd0a (06e397d0f983db15.a) swap on sd0b dump on sd0b On 24.1.2024. 0:48, Hrvoje Popovski wrote: > Hi all, > > in production I have simple carp pfsync setup with > em0 - carp <- uplink > em1 - pfsync > ix0 - vlan - carp <- internal networks > ix1 - not used > and for vpn I have wireguard and people connects to em0 carp address. > There's no bridges or tunnels or any exotic pf feature in this setup. > > Until this snapshot > OpenBSD 7.4-current (GENERIC.MP) #1587: Sat Dec 30 22:44:51 MST 2023 > every this was fine, > but with and after > OpenBSD 7.4-current (GENERIC.MP) #1588: Thu Jan 4 20:58:35 MST 2024 > em0 starts to go up/down spontaneously and em0 watchdog logs start to > appear in messages > > em0: watchdog: head 113 tail 178 TDH 178 TDT 113 > carp1: state transition: BACKUP -> MASTER > > even with net.inet.tcp.tso=0 > > > When reverting em TSO diffs if_em.c to r1.369 and if_em.h to r1.80 > firewall starts to work normally and em0 is fine. > > After rebooting firewall and promote it to carp master I've started to > collect kstat em0::: after em0 watchdog log > > > 1) Jan 22 08:01:01 fw2 /bsd: em0: watchdog: head 473 tail 25 TDH 25 TDT 473 > kstat em0::: - em0-1.txt > 2) Jan 22 08:07:11 fw2 /bsd: em0: watchdog: head 114 tail 178 TDH 178 > TDT 114 > 3) Jan 22 08:08:16 fw2 /bsd: em0: watchdog: head 61 tail 126 TDH 126 TDT 61 > kstat em0::: - em0-3.txt > 4) Jan 22 08:21:23 fw2 /bsd: em0: watchdog: head 452 tail 5 TDH 5 TDT 452 > 5) Jan 22 08:33:48 fw2 /bsd: em0: watchdog: head 352 tail 416 TDH 416 > TDT 352 > 6) Jan 22 08:36:20 fw2 /bsd: em0: watchdog: head 446 tail 510 TDH 510 > TDT 446 > kstat em0::: - em0-6.txt > 7) Jan 22 08:42:16 fw1 /bsd: em0: watchdog: head 385 tail 450 TDH 450 > TDT 385 > kstat em0::: - em0-7.txt > > > in the attachment you can find em0 txt kstat output and kstat-all.txt > which is kstat of all interfaces with TSO diff after 7th time em0 > watchdog log > > From logs it seems that em0:0:txq:0 oactives counter, em0 watchdog and > em0 going up/down is somehow connected because every time I see em0 > watchdog, oactives counter is increased by one > > > log on switch > I 01/22/24 08:01:01 00077 ports: port 2 is now off-line > I 01/22/24 08:01:05 00076 ports: port 2 is now on-line > > I 01/22/24 08:07:11 00077 ports: port 2 is now off-line > I 01/22/24 08:07:14 00076 ports: port 2 is now on-line > > I 01/22/24 08:08:16 00077 ports: port 2 is now off-line > I 01/22/24 08:08:20 00076 ports: port 2 is now on-line > > I 01/22/24 08:21:23 00077 ports: port 2 is now off-line > I 01/22/24 08:21:26 00076 ports: port 2 is now on-line > > I 01/22/24 08:33:47 00077 ports: port 2 is now off-line > I 01/22/24 08:33:51 00076 ports: port 2 is now on-line > > I 01/22/24 08:36:20 00077 ports: port 2 is now off-line > I 01/22/24 08:36:24 00076 ports: port 2 is now on-line > > I 01/22/24 08:42:16 00077 ports: port 2 is now off-line > I 01/22/24 08:42:20 00076 ports: port 2 is now on-line > > em0 is connected to port 2 > ix0 is connected to port 6 and it's up whole the time... > > > Packet processing and some little pressure need to be over em0 to > trigger em0 watchdog and only carp master is affected. Over night th
TSO em(4) problem
Hi all, in production I have simple carp pfsync setup with em0 - carp <- uplink em1 - pfsync ix0 - vlan - carp <- internal networks ix1 - not used and for vpn I have wireguard and people connects to em0 carp address. There's no bridges or tunnels or any exotic pf feature in this setup. Until this snapshot OpenBSD 7.4-current (GENERIC.MP) #1587: Sat Dec 30 22:44:51 MST 2023 every this was fine, but with and after OpenBSD 7.4-current (GENERIC.MP) #1588: Thu Jan 4 20:58:35 MST 2024 em0 starts to go up/down spontaneously and em0 watchdog logs start to appear in messages em0: watchdog: head 113 tail 178 TDH 178 TDT 113 carp1: state transition: BACKUP -> MASTER even with net.inet.tcp.tso=0 When reverting em TSO diffs if_em.c to r1.369 and if_em.h to r1.80 firewall starts to work normally and em0 is fine. After rebooting firewall and promote it to carp master I've started to collect kstat em0::: after em0 watchdog log 1) Jan 22 08:01:01 fw2 /bsd: em0: watchdog: head 473 tail 25 TDH 25 TDT 473 kstat em0::: - em0-1.txt 2) Jan 22 08:07:11 fw2 /bsd: em0: watchdog: head 114 tail 178 TDH 178 TDT 114 3) Jan 22 08:08:16 fw2 /bsd: em0: watchdog: head 61 tail 126 TDH 126 TDT 61 kstat em0::: - em0-3.txt 4) Jan 22 08:21:23 fw2 /bsd: em0: watchdog: head 452 tail 5 TDH 5 TDT 452 5) Jan 22 08:33:48 fw2 /bsd: em0: watchdog: head 352 tail 416 TDH 416 TDT 352 6) Jan 22 08:36:20 fw2 /bsd: em0: watchdog: head 446 tail 510 TDH 510 TDT 446 kstat em0::: - em0-6.txt 7) Jan 22 08:42:16 fw1 /bsd: em0: watchdog: head 385 tail 450 TDH 450 TDT 385 kstat em0::: - em0-7.txt in the attachment you can find em0 txt kstat output and kstat-all.txt which is kstat of all interfaces with TSO diff after 7th time em0 watchdog log >From logs it seems that em0:0:txq:0 oactives counter, em0 watchdog and em0 going up/down is somehow connected because every time I see em0 watchdog, oactives counter is increased by one log on switch I 01/22/24 08:01:01 00077 ports: port 2 is now off-line I 01/22/24 08:01:05 00076 ports: port 2 is now on-line I 01/22/24 08:07:11 00077 ports: port 2 is now off-line I 01/22/24 08:07:14 00076 ports: port 2 is now on-line I 01/22/24 08:08:16 00077 ports: port 2 is now off-line I 01/22/24 08:08:20 00076 ports: port 2 is now on-line I 01/22/24 08:21:23 00077 ports: port 2 is now off-line I 01/22/24 08:21:26 00076 ports: port 2 is now on-line I 01/22/24 08:33:47 00077 ports: port 2 is now off-line I 01/22/24 08:33:51 00076 ports: port 2 is now on-line I 01/22/24 08:36:20 00077 ports: port 2 is now off-line I 01/22/24 08:36:24 00076 ports: port 2 is now on-line I 01/22/24 08:42:16 00077 ports: port 2 is now off-line I 01/22/24 08:42:20 00076 ports: port 2 is now on-line em0 is connected to port 2 ix0 is connected to port 6 and it's up whole the time... Packet processing and some little pressure need to be over em0 to trigger em0 watchdog and only carp master is affected. Over night there are 2 or 3 em0 watchdogs. Firewalls are more than underutilized cca 5k states and under 100Mbps To rule out em hardware problem, I've sysupdate second firewall and problem was the same as on first one. I am willing to debug this further but I don't know what to look any more ... And of course, thank you guys for carp and pfsync, without it this would be a problem but it's not :) kstat em0::: after day without TSO diffs fw2# uptime 12:36AM up 1 day, 13:38, 2 users, load averages: 0.35, 0.23, 0.23 fw2# kstat em0::: em0:0:em-stats:0 rx crc errs: 0 packets rx align errs: 0 packets rx align errs: 0 packets rx errs: 0 packets rx missed: 0 packets tx single coll: 0 packets tx excess coll: 0 packets tx multi coll: 0 packets tx late coll: 0 packets tx coll: 0 tx defers: 0 tx no CRS: 0 packets seq errs: 0 carr ext errs: 0 packets rx len errs: 0 packets rx xon: 0 packets tx xon: 0 packets rx xoff: 0 packets tx xoff: 0 packets FC unsupported: 0 packets rx 64B: 6361422 packets rx 65-127B: 19106140 packets rx 128-255B: 4430154 packets rx 256-511B: 5116503 packets rx 512-1023B: 7665843 packets rx 1024-maxB: 86778341 packets rx good: 129458403 packets rx bcast: 147 packets rx mcast: 112979 packets tx good: 67968976 packets rx good: 134077827262 bytes tx good: 32314161469 bytes rx no buffers: 4 packets rx undersize: 0 packets rx fragments: 0 packets rx oversize: 0 packets rx jabbers: 0 packets rx mgmt: 0 packets rx mgmt drops: 0 packets tx mgmt: 0 packets rx total: 134077827262 bytes tx total: 32314161469 bytes rx total: 129458403 packets tx total: 67968976 packets tx 64B: 8932448 packets tx 65-127B: 31092764 packets tx 128-255B: 3930861 packets tx 256-511B: 2126737 packets tx 512-1023B: 4009214
Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.
On 9.1.2024. 3:04, Jonathan Matthew wrote: > On Wed, Jan 03, 2024 at 10:14:12AM +0100, Hrvoje Popovski wrote: >> On 3.1.2024. 7:51, Jonathan Matthew wrote: >>> On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote: >>>> On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote: >>>>> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt >>>>> interface and ifconfig bnxt0 down/up at the same time I can trigger >>>>> panic. Panic can be triggered without kettenis@ diff... >>>> It is easy to reproduce. ifconfig bnxt1 down/up a few times while >>>> receiving TCP traffic with iperf3. Machine still has kettenis@ diff. >>>> My panic looks different. >>> It looks like I wasn't trying very hard when I wrote bnxt_down(). >>> I think there's also a problem with bnxt_up() unwinding after failure >>> in various places, but that's a different issue. >>> >>> This makes it a more resilient for me, though it still logs >>> 'bnxt0: unexpected completion type 3' a lot if I take the interface >>> down while it's in use. I'll look at that separately. >> >> Hi, >> >> with this diff I can still panic box with ifconfig up/down but not as >> fast as without it > > Right, this is the other problem where bnxt_up() wasn't cleaning up properly > after failing part way through. This diff should fix that, but I don't think > it will fix the 'HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error' > problem, so the interface will still stop working at that point. > With this diff bnxt behaves exactly as you said. After a lot of ifconfig down/up at some point I get smc24# ifconfig bnxt0 down smc24# ifconfig bnxt0 up bnxt0: attempt to re-allocate ring 0010 bnxt0: failed to allocate completion queue 0 and bnxt stop working .. > > Index: if_bnxt.c > === > RCS file: /cvs/src/sys/dev/pci/if_bnxt.c,v > retrieving revision 1.39 > diff -u -p -r1.39 if_bnxt.c > --- if_bnxt.c 10 Nov 2023 15:51:20 - 1.39 > +++ if_bnxt.c 9 Jan 2024 01:59:38 - > @@ -1073,7 +1081,7 @@ bnxt_up(struct bnxt_softc *sc) > if (bnxt_hwrm_vnic_ctx_alloc(sc, >sc_vnic.rss_id) != 0) { > printf("%s: failed to allocate vnic rss context\n", > DEVNAME(sc)); > - goto down_queues; > + goto down_all_queues; > } > > sc->sc_vnic.id = (uint16_t)HWRM_NA_SIGNATURE; > @@ -1139,8 +1147,11 @@ dealloc_vnic: > bnxt_hwrm_vnic_free(sc, >sc_vnic); > dealloc_vnic_ctx: > bnxt_hwrm_vnic_ctx_free(sc, >sc_vnic.rss_id); > + > +down_all_queues: > + i = sc->sc_nqueues; > down_queues: > - for (i = 0; i < sc->sc_nqueues; i++) > + while (i-- > 0) > bnxt_queue_down(sc, >sc_queues[i]); > > bnxt_dmamem_free(sc, sc->sc_rx_cfg); >
Re: bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.
On 3.1.2024. 7:51, Jonathan Matthew wrote: > On Wed, Jan 03, 2024 at 01:50:06AM +0100, Alexander Bluhm wrote: >> On Wed, Jan 03, 2024 at 12:26:26AM +0100, Hrvoje Popovski wrote: >>> While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt >>> interface and ifconfig bnxt0 down/up at the same time I can trigger >>> panic. Panic can be triggered without kettenis@ diff... >> It is easy to reproduce. ifconfig bnxt1 down/up a few times while >> receiving TCP traffic with iperf3. Machine still has kettenis@ diff. >> My panic looks different. > It looks like I wasn't trying very hard when I wrote bnxt_down(). > I think there's also a problem with bnxt_up() unwinding after failure > in various places, but that's a different issue. > > This makes it a more resilient for me, though it still logs > 'bnxt0: unexpected completion type 3' a lot if I take the interface > down while it's in use. I'll look at that separately. Hi, with this diff I can still panic box with ifconfig up/down but not as fast as without it panic with diff bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error. bnxt0: failed to set up tx ring uvm_fault(0xfd8e57e02460, 0xff0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at bnxt_queue_down+0x62: movq0(%r12,%rax,1),%rsi TIDPIDUID PRFLAGS PFLAGS CPU COMMAND * 70181 53204 0 0x3 00K ifconfig bnxt_queue_down(802c9000,802c9f88) at bnxt_queue_down+0x62 bnxt_up(802c9000) at bnxt_up+0x36b bnxt_ioctl(802c9048,80206910,8000607fffd0) at bnxt_ioctl+0x162 ifioctl(fd8e417ab758,80206910,8000607fffd0,800060797aa8) at ifioctl+0x726 sys_ioctl(800060797aa8,8000608000d0,800060800120) at sys_ioctl+0x2af syscall(800060800190) at syscall+0x3b4 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7e3d0a930430, count: 8 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> show reg rdi 0x8244b950pci_bus_dma_tag rsi 0x802c9f88 rbp 0x8000607ffe40 rbx0x101 rdx 0xc803 rcx0x206 rax0xff0 r8 0x3f r9 0 r10 0xa14b312597c5ea6a r11 0x819fac40_bus_dmamap_destroy r120 r130x100 r14 0x802c9f88 r15 0x802c9000 rip 0x81b578e2bnxt_queue_down+0x62 cs 0x8 rflags 0x10216__ALIGN_SIZE+0xf216 rsp 0x8000607ffde0 ss 0x10 bnxt_queue_down+0x62: movq0(%r12,%rax,1),%rsi ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND *53204 70181 81971 0 7 0x3ifconfig 57044 336864 81971 0 30x100083 kqreadiperf3 57044 317909 81971 0 3 0x4100083 kqreadiperf3 57044 253167 81971 0 3 0x4100083 kqreadiperf3 57044 199984 81971 0 3 0x4100083 kqreadiperf3 57044 343144 81971 0 3 0x4100083 kqreadiperf3 81971 379109 1 0 30x10008b sigsusp ksh 69236 410163 1 0 30x100098 kqreadcron 28984 478747 27164 95 3 0x1100092 kqreadsmtpd 75309 290569 27164103 3 0x1100092 kqreadsmtpd 3782 175531 27164 95 3 0x1100092 kqreadsmtpd 60089 38850 27164 95 30x100092 kqreadsmtpd 72803 151501 27164 95 3 0x1100092 kqreadsmtpd 88240 203086 27164 95 3 0x1100092 kqreadsmtpd 27164 293957 1 0 30x100080 kqreadsmtpd 51687 170066 1 0 30x88 kqreadsshd 82716 114406 1 0 30x100080 kqreadntpd 95469 439610 76144 83 30x100092 kqreadntpd 76144 242283 1 83 3 0x1100092 kqreadntpd 25275 206721 16938 73 3 0x1100090 kqreadsyslogd 16938 424245 1 0 30x100082 netio syslogd 92580 279098 0 0 3 0x14200 bored smr 40549 159120 0 0 3 0x14200 pgzerozerothread 12488 115575 0 0 3 0x14200 aiodoned aiodoned 91171 460632 0 0 3 0x14200 syncerupdate 83952 275089 0 0 3 0x14200 cleaner cleaner 6394 148862 0 0 3 0x14200 reaperreaper 60888 287201 0 0 3 0x14200 pgdaemon pagedaemon 25804 403088 0 0 3
bnxt panic - HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error.
Hi all, While testing kettenis@ ipl diff from tech@ and doing iperf3 to bnxt interface and ifconfig bnxt0 down/up at the same time I can trigger panic. Panic can be triggered without kettenis@ diff... bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error. bnxt0: failed to set up tx ring uvm_fault(0xfd8e57f12a20, 0xff0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at bnxt_queue_down+0x62: movq0(%r12,%rax,1),%rsi TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *292054 36537 0 0x3 00K ifconfig 163937 81780 0 0x14000 0x42006 sensors bnxt_queue_down(802c9000,802c9f88) at bnxt_queue_down+0x62 bnxt_up(802c9000) at bnxt_up+0x36b bnxt_ioctl(802c9048,80206910,8000607295f0) at bnxt_ioctl+0x162 ifioctl(fd8e442f2758,80206910,8000607295f0,8000607cf2b0) at ifioctl+0x726 sys_ioctl(8000607cf2b0,8000607296f0,800060729740) at sys_ioctl+0x2af syscall(8000607297b0) at syscall+0x3b4 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x726ac871f790, count: 8 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> show reg rdi 0x82485c78pci_bus_dma_tag rsi 0x802c9f88 rbp 0x800060729460 rbx0x101 rdx 0xc803 rcx0x286 rax0xff0 r8 0x3f r9 0 r10 0x96b31028f3e5d46c r11 0x81825410_bus_dmamap_destroy r120 r130x100 r14 0x802c9f88 r15 0x802c9000 rip 0x81db3da2bnxt_queue_down+0x62 cs 0x8 rflags 0x10216__ALIGN_SIZE+0xf216 rsp 0x800060729400 ss 0x10 bnxt_queue_down+0x62: movq0(%r12,%rax,1),%rsi ddb{0}> ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND *36537 292054 47404 0 7 0x3ifconfig 86797 280843 47404 0 30x100083 kqreadiperf3 86797 429491 47404 0 3 0x4100083 kqreadiperf3 86797 214299 47404 0 3 0x4100083 kqreadiperf3 86797 368590 47404 0 3 0x4100083 kqreadiperf3 86797 380965 47404 0 3 0x4100083 kqreadiperf3 47404 299766 1 0 30x10008b sigsusp ksh 7161 521423 1 0 30x100098 kqreadcron 39740 121938 83604 95 3 0x1100092 kqreadsmtpd 94839 467744 83604103 3 0x1100092 kqreadsmtpd 31264 522699 83604 95 3 0x1100092 kqreadsmtpd 94528 511199 83604 95 30x100092 kqreadsmtpd 37502 123618 83604 95 3 0x1100092 kqreadsmtpd 89306 15887 83604 95 3 0x1100092 kqreadsmtpd 83604 206718 1 0 30x100080 kqreadsmtpd 428 70010 1 0 30x88 kqreadsshd 94146 379619 1 0 30x100080 kqreadntpd 23446 401588 82414 83 30x100092 kqreadntpd 82414 378350 1 83 3 0x1100092 kqreadntpd 80891 252069 55631 73 3 0x1100090 kqreadsyslogd 55631 62854 1 0 30x100082 netio syslogd 60491 452354 0 0 3 0x14200 bored smr 20945 92407 0 0 3 0x14200 pgzerozerothread 369255987 0 0 3 0x14200 aiodoned aiodoned 55091 437847 0 0 3 0x14200 syncerupdate 13970 164134 0 0 3 0x14200 cleaner cleaner 36841 522592 0 0 3 0x14200 reaperreaper 93326 303752 0 0 3 0x14200 pgdaemon pagedaemon 7898 311095 0 0 3 0x14200 usbtskusbtask 2747 192075 0 0 3 0x14200 usbatsk usbatsk 97645 203456 0 0 3 0x40014200 acpi0 acpi0 57525 67008 0 0 7 0x40014200idle23 51862 472206 0 0 7 0x40014200idle22 60651 418998 0 0 7 0x40014200idle21 3576 237393 0 0 7 0x40014200idle20 6504 170181 0 0 7 0x40014200idle19 207063186 0 0 7 0x40014200idle18 78053 233580 0 0 7 0x40014200idle17 29625 58284 0 0 7 0x40014200idle16 94538 146456 0 0 7 0x40014200idle15 84902 429192 0 0 7 0x40014200
Re: terminal is cleared when logging as root
On 23.10.2023. 19:16, Daniel Jakots wrote: > Hi, I installed a new machine on Saturday (with -current) and I noticed > that when I logged in as root the terminal get cleared but not cleanly. > I upgraded a existing machine to a newer snapshot and then the problem > appeared as well. This happens when using `doas su -`, `ssh root@` and I > think I had it on console as well. For some reason, it doesn't happen > with my regular user. Previous snapshot was from 2023-10-13. I guess > it's since the libcurses update on the 17th? Cheers, Daniel Hi, I confirm what Daniel said over ssh and over console I'm getting this OpenBSD/amd64 (r620-1.srce.hr) (tty01) login: root Password: Last login: Mon Oct 23 10:18:24 on tty01 OpenBSD 7.4-current (GENERIC.MP) #1419: Mon Oct 23 10:14:12 MDT 2023 Welcome to OpenBSD: The proactively secure Unix-like operating system. Please use the sendbug(1) utility to report bugs in the system. Before reporting a bug, please try to reproduce it with the latest version of the code. With bug reports, please try to ensure that enough information to reproduce the problem is enclosed, and if a known fix for it exists, include that as well. You have mail. F r620-1# when I login as user over console OpenBSD/amd64 (r620-2.srce.hr) (tty01) login: hrvoje Password: Last login: Mon Oct 23 19:55:11 on ttyp2 from 161.53.255.123 OpenBSD 7.4-current (GENERIC.MP) #1419: Mon Oct 23 10:14:12 MDT 2023 Welcome to OpenBSD: The proactively secure Unix-like operating system. Please use the sendbug(1) utility to report bugs in the system. Before reporting a bug, please try to reproduce it with the latest version of the code. With bug reports, please try to ensure that enough information to reproduce the problem is enclosed, and if a known fix for it exists, include that as well. You have new mail. r620-2$ su - Password: F r620-2#
Re: Dell R6515 with mpii HBA330 Mini - mpii_scsi_cmd_tmo (0x40005862)
On 1.8.2023. 1:59, Hrvoje Popovski wrote: > Hi all, > > I've got 2 new Dell servers for vpns and firewalling with Dell non-raid > HBA330 mini and after install both firewalls freeze with this log > This is not first time I saw that log on dell servers with HBA330. Nice thing is, and I didn't know that, when ssd is replaced with nvme disk on same slot, HBA330 is out of the game and it seems that nvme is connected direct on motherboard and it's so fast. But I need to boot uefi so nvme disk can be recognized as boot disk by Dell server. vpn1# dmesg | grep nvme nvme0 at pci13 dev 0 function 0 vendor "SK hynix", unknown product 0x2839 rev 0x21: msix, NVMe 1.3 nvme0: Dell DC NVMe PE8010 RI U.2 960GB, firmware 1.2.0, serial SJC2N4257I34R2Q19 scsibus2 at nvme0: 17 targets, initiator 0
Dell R6515 with mpii HBA330 Mini - mpii_scsi_cmd_tmo (0x40005862)
Hi all, I've got 2 new Dell servers for vpns and firewalling with Dell non-raid HBA330 mini and after install both firewalls freeze with this log This is not first time I saw that log on dell servers with HBA330. OpenBSD/amd64 (vpn2.lan) (tty00) login: root 123 ^Cmpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) after that I can only reboot boxes and sometimes I would be able to login over idrac or ssh but mostly it will freeze and print log above. I've saw that mpii_scsi_cmd_tmp log even in ramdisk - sysupgrade. sysupgrade Set name(s)? (or 'abort' or 'done') [done] done Directory does not contain SHA256.sig. Continue without verification? [no] yes Installing bsd 100% |**| 24695 KB00:00 Installing bsd.mp 100% |**| 24787 KB00:00 Installing bsd.rd 100% |**| 4549 KB00:00 Installing base73.tgz 100% |**| 368 MB00:05 Installing comp73.tgz 100% |**| 75590 KB00:02 Installing man73.tgz100% |**| 7822 KB00:00 Installing game73.tgz 100% |**| 2748 KB00:00 Installing xbase73.tgz0% | | 0 --:-- ETAmpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) mpii0: mpii_scsi_cmd_tmo (0x40005862) That one time that I was able to do reposync and checkout from local disk I've got mpii0: mpii_scsi_cmd_tmo (0x2400) mpii0: mpii_scsi_cmd_tmo (0x2400) sd0(mpii0:4:0): Check Condition (error 0x70) on opcode 0x2a SENSE KEY: Not Ready ASC/ASCQ: Logical Unit Not Ready, Cause Not Reportable sd0(mpii0:4:0): Check Condition (error 0x70) on opcode 0x2a SENSE KEY: Not Ready ASC/ASCQ: Logical Unit Not Ready, Cause Not Reportable sd0(mpii0:4:0): Check Condition (error 0x70) on opcode 0x2a SENSE KEY: Not Ready ASC/ASCQ: Logical Unit Not Ready, Cause Not Reportable sd0(mpii0:4:0): Check Condition (error 0x70) on opcode 0x28 SENSE KEY: Not Ready ASC/ASCQ: Logical Unit Not Ready, Cause Not Reportable sd0(mpii0:4:0): Check Condition (error 0x70) on opcode 0x28 SENSE KEY: Not Ready ASC/ASCQ: Logical Unit Not Ready, Cause Not Reportable kernel: protection fault trap, code=0 Stopped at dounmount+0x52: movq0x28(%r13),%rax ddb> trace dounmount(802f6000,808,800025420828) at dounmount+0x52 vop_generic_revoke(800025435858) at vop_generic_revoke+0x7d VOP_REVOKE(fdb8e9d931c0,1) at VOP_REVOKE+0x3b vdevgone(4,0,f,3) at vdevgone+0xd5 disk_gone(816bd690,0) at disk_gone+0x7e sddetach(802cfa00,1) at sddetach+0x3e config_detach(802cfa00,1) at config_detach+0x140 scsi_detach_link(802f3e00,1) at scsi_detach_link+0x60 scsi_detach_target(801ec380,4,1) at scsi_detach_target+0x5d mpii_event_sas(801e9c00) at mpii_event_sas+0x251 taskq_thread(8246ad10) at taskq_thread+0xf0 end trace frame: 0x0, count: -11 dmesg: OpenBSD 7.3-current (GENERIC.MP) #1324: Mon Jul 31 14:48:11 MDT 2023 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 274435207168 (261721MB) avail mem = 266098278400 (253771MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.3 @ 0x6989c000 (68 entries) bios0: vendor Dell Inc. version "2.11.4" date 03/22/2023 bios0: Dell Inc. PowerEdge R6515 acpi0 at bios0: ACPI 6.3 acpi0: sleep states S0 S5 acpi0: tables DSDT FACP BERT ERST HEST HPET APIC MCFG WSMT SSDT SSDT EINJ SSDT CRAT CDIT IVRS SPCR SSDT SSDT acpi0: wakeup devices PC00(S5) XHCI(S3) PC01(S5) XHCI(S3) PC02(S5) XHCI(S3) PC03(S5) XHCI(S3) acpitimer0 at acpi0: 3579545 Hz, 32 bits acpihpet0 at acpi0: 14318180 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat ioapic0 at mainbus0: apid 240 pa 0xfec0, version 21, 24 pins, can't remap ioapic1 at mainbus0: apid 241 pa 0xe010, version 21, 32 pins, can't remap ioapic2 at mainbus0: apid 242 pa 0xc510, version 21, 32 pins, can't remap ioapic3 at mainbus0: apid 243 pa 0xaa10, version 21, 32 pins, can't remap ioapic4 at mainbus0: apid 244 pa 0xfd10, version 21, 32 pins, can't remap cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD EPYC 7313P 16-Core Processor, 3000.00 MHz, 19-01-01 cpu0:
Re: dvmrpd start causes kernel panic: assertion failed
On 7.6.2023. 12:31, Why 42? The lists account. wrote: > > Hi All, > > Just FYI, in my attempts to route multicast traffic I started a daemon > called "dvmrpd", the kernel paniced immediately. See attached photo. > > Prior to the panic the XFCE Desktop was running. After the panic I could > not find any combination of keys that would allow me to enter a debugger > or gather more info. > > I then modified the dvmrpd config file e.g. to change the interface > configuration and also changed the configured interface IP addresses > themselves. A second attempt to start the daemon resulted in the same > immediate panic. So it could be that I don't know what I'm doing, but > apparently pretty reproducible :-/ > > This occurred using the 7.3 AMD64 release on a Lenovo ThinkPad with an > 11th gen i7 CPU. > > Cheers, > Robb. > If dvmrpd is enabled you should see this in boot msg. starting network daemons: sshd dvmrpd smtpd. I don't see that in your screenshot. c/p from boot msg when multicast forwarding and dvmrpd is enabled. ddb.console: 0 -> 1 kern.pool_debug: 1 -> 0 kern.maxclusters: 262144 -> 1048576 net.inet.ip.mforwarding: 0 -> 1 starting network reordering: ld.so libc libcrypto sshd. starting early daemons: syslogd ntpd. starting RPC daemons:. savecore: no core dump checking quotas: done. clearing /tmp kern.securelevel: 0 -> 1 creating runtime link editor directory cache. preserving editor files. starting network daemons: sshd dvmrpd smtpd. starting local daemons: cron. Wed Jun 7 19:47:52 CEST 2023
Re: pfsync_bulk_update panic
On 10.5.2023. 0:24, Alexandr Nedvedicky wrote: > Hello, > > > On Tue, May 09, 2023 at 06:26:43PM +, mabi wrote: >> Hi, >> >> On a brand new OpenBSD 7.3 firewall (amd64) I get a kernel panic every few >> days and was wondering if this panic I get is related to this issue/bug? >> > > your panic got fixed by recent commit [1] > > Hrvoje was/is hitting very close to that KASSERT() (now removed) at line 2274. > in Hrvoje's case the TAILQ_REMOVE() macro complains we attempt to remove state > which is removed already: > > 2273 atomic_sub_long(>sc_len, pfsync_qs[q].len); > 2274 TAILQ_REMOVE(>sc_qs[q], st, sync_list); > 2275 if (TAILQ_EMPTY(>sc_qs[q])) > 2276 atomic_sub_long(>sc_len, sizeof (struct > pfsync_subheader)); > 2277 st->sync_state = PFSYNC_S_NONE; > 2278 mtx_leave(>sc_st_mtx); > 2279 > 2280 pf_state_unref(st); > > the cause is very similar pfsync relies on volatile value in ->sync_state > member. > The ->sync_state member must be modified under protection of ->mtx. > > The issue has been pointed out by bluhm@ during m2k23 hackathon where I shared > my pfsync(4) headache with him. Diff below is my attempt to fix it. I had no > chance to test it. I'll appreciate If you will give it a try and let me know > how things look like. > > thanks and > regards > sashan > > [1] https://marc.info/?l=openbsd-cvs=168269695603160=2 > > Hi, with this diff I can't trigger panic below r620-1# uvm_fault(0x82598710, 0x17, 0, 2) -> e kernel: page fault trap, code=2 Stopped at pfsync_q_del+0x8d: movq%rdx,0x8(%rax) TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *254020 58090 0 0x14000 0x2000K systq pfsync_q_del(fd8360adf6e0) at pfsync_q_del+0x8d pfsync_delete_state(fd8360adf6e0) at pfsync_delete_state+0x118 pf_remove_state(fd8360adf6e0) at pf_remove_state+0x156 pf_purge_expired_states(2e9c1) at pf_purge_expired_states+0x273 pf_purge(8258caa0) at pf_purge+0x2c taskq_thread(824a8150) at taskq_thread+0x100 end trace frame: 0x0, count: 9 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> I'm only being able to trigger it in lab and quite fast. Now, after few hours it's still stable. > 8<---8<---8<--8< > diff --git a/sys/net/if_pfsync.c b/sys/net/if_pfsync.c > index 822b4211d0f..811d9d59666 100644 > --- a/sys/net/if_pfsync.c > +++ b/sys/net/if_pfsync.c > @@ -1362,14 +1362,17 @@ pfsync_grab_snapshot(struct pfsync_snapshot *sn, > struct pfsync_softc *sc) > > while ((st = TAILQ_FIRST(>sc_qs[q])) != NULL) { > TAILQ_REMOVE(>sc_qs[q], st, sync_list); > + mtx_enter(>mtx); > if (st->snapped == 0) { > TAILQ_INSERT_TAIL(>sn_qs[q], st, sync_snap); > st->snapped = 1; > + mtx_leave(>mtx); > } else { > /* >* item is on snapshot list already, so we can >* skip it now. >*/ > + mtx_leave(>mtx); > pf_state_unref(st); > } > } > @@ -1422,11 +1425,13 @@ pfsync_drop_snapshot(struct pfsync_snapshot *sn) > continue; > > while ((st = TAILQ_FIRST(>sn_qs[q])) != NULL) { > + mtx_enter(>mtx); > KASSERT(st->sync_state == q); > KASSERT(st->snapped == 1); > TAILQ_REMOVE(>sn_qs[q], st, sync_snap); > st->sync_state = PFSYNC_S_NONE; > st->snapped = 0; > + mtx_leave(>mtx); > pf_state_unref(st); > } > } > @@ -1665,6 +1670,7 @@ pfsync_sendout(void) > > count = 0; > while ((st = TAILQ_FIRST(_qs[q])) != NULL) { > + mtx_enter(>mtx); > TAILQ_REMOVE(_qs[q], st, sync_snap); > KASSERT(st->sync_state == q); > KASSERT(st->snapped == 1); > @@ -1672,6 +1678,7 @@ pfsync_sendout(void) > st->snapped = 0; > pfsync_qs[q].write(st, m->m_data + offset); > offset += pfsync_qs[q].len; > + mtx_leave(>mtx); > > pf_state_unref(st); > count++; > @@ -1725,8 +1732,6 @@ pfsync_insert_state(struct pf_state *st) > ISSET(st->state_flags, PFSTATE_NOSYNC)) > return; > > - KASSERT(st->sync_state == PFSYNC_S_NONE); > - > if (sc->sc_len == PFSYNC_MINPKT) >
Re: pfsync_bulk_update panic
On 8.2.2023. 8:53, Alexandr Nedvedicky wrote: > Hello, > > On Tue, Feb 07, 2023 at 09:12:38PM +0100, Hrvoje Popovski wrote: > >> >> Hi, >> >> this panic is with plain snapshot and I didn't do anything. I will leave >> box in ddb if something else is needed. >> > It does not look like there is more data to gather in ddb. > may be I'm quick in my judgment. this is the relevant part > of pfsync_bulk_update() function: > 2456 int i = 0; > /* `i` seems to be kept in %r12 */ > 2457 > 2458 NET_LOCK(); > 2459 sc = pfsyncif; > 2460 if (sc == NULL) > 2461 goto out; > 2462 > 2463 rw_enter_read(_state_list.pfs_rwl); > 2464 st = sc->sc_bulk_next; > /* `st` is kept in %r15 > 2465 sc->sc_bulk_next = NULL; > 2466 > 2467 for (;;) { > 2468 if (st->sync_state == PFSYNC_S_NONE && > 2469 st->timeout < PFTM_MAX && > 2470 st->pfsync_time <= sc->sc_ureq_received) { > 2471 pfsync_update_state_req(st); > 2472 i++; > 2473 } > > > > >> ddb{0}> dmesg >> OpenBSD 7.2-current (GENERIC.MP) #1021: Sun Feb 5 09:52:50 MST 2023 >> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP >> >> >> r620-2# uvm_fault(0x824fb2f8, 0x14e, 0, 1) -> e >> kernel: page fault trap, code=0 >> Stopped at pfsync_bulk_update+0x60:cmpb$0xff,0x14e(%r15) >> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND >> *109809 58944 0 0x14000 0x42000K softclock >> pfsync_bulk_update(0) at pfsync_bulk_update+0x60 > we seems to be dying at line 2468 due to a NULL pointer dereference > >> softclock_thread(8000f050) at softclock_thread+0x13b >> end trace frame: 0x0, count: 13 >> https://www.openbsd.org/ddb.html describes the minimum info required in >> bug reports. Insufficient info makes it difficult to find and fix bugs. >> ddb{0}> >> > > >> r11 0xfbec2dfc846efdb5 >> r120 >> r13 0x82503f80timeout_proc >> r14 0x809d8000 >> r150 >> rip 0x8101aea0pfsync_bulk_update+0x60 > r12 (`i`) is 0 which suggest the loop is most likely in its first > iteration > r15 (`st`) is 0 ... so looks like it's trivial bug we try to send > a bulk but there is nothing to send. this makes me wonder if diff below > makes your test box more stable. > > > can you give a try a diff below? > > thanks a lot for your help > > regards > sashan Hi, with this diff I can't trigger panic as before. I'm trying the whole day and I should be able to see panic or 2, but there isn't any ... Thank you...
pfsync_bulk_update panic
Hi all, In lab I'm playing around with ip4/ip6 sasyncd setup which requires carp, pf, pfsync, isakmpd, sasyncd. I'm sending ip4/ip6 traffic though ipsec tunnels and at the same time sending ip4 traffic over firewall just to activate all cores. I'm having NET_TASKQ=6 on 6 cores firewalls. ix2 is pfsync interface and when sending traffic and doing ifconfig ix2 down && ifconfig ix2 up from time to time I'm able to trigger panic. this panic is with WITNESS and when doing mach ddbcpu X box freeze r620-1# ifconfig ix2 down r620-1# ifconfig ix2 up uvpma_fnaiult(c:0 x kfefrfnfel8 251 e e6 8, 0 x 1 6 e , 0,1 ) - >d iae kgernnosetli:c p a g e f a ul t tr a p , co d e= 0 Stopped at pfsync_bulk_update+0x60:cmpb$0xff,0x16e(%r15) TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 270521 35272 68 0x110 01 sasyncd 489979 76548 0 0x14000 0x2003 reaper 164092 74224 0 0x14000 0x2004 softnet 112060 78126 0 0x14000 0x2002 systq *372775 98656 0 0x14000 0x42000 softclock pfsync_bulk_update(0) at pfsync_bulk_update+0x60 timeout_run(81942978) at timeout_run+0x93 softclock_thread(8000f050) at softclock_thread+0x11d end trace frame: 0x0, count: 12 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> ddb{0}> show panic *cpu0: uvm_fault(0x8251ee68, 0x16e, 0, 1) -> e cpu3: kernel diagnostic assertion "!_kernel_lock_held()" failed: file "/sys/uvm/uvm_map.c", line 2539 ddb{0}> ddb{0}> show reg rdi 0x4 rsi0 rbp 0x800022d53bb0 rbx0 rdx 0xde007fffc240 rcx0x206 rax 0xd r80x7fff r90x800022d53c40 r10 0x82084c2bcmd0646_9_tim_udma+0x485f1 r11 0xbeeb38867a1c691d r120 r13 0x8000f050 r14 0x81942000 r150 rip 0x814e71e0pfsync_bulk_update+0x60 cs 0x8 rflags 0x10246__ALIGN_SIZE+0xf246 rsp 0x800022d53b70 ss 0 pfsync_bulk_update+0x60:cmpb$0xff,0x16e(%r15) ddb{0}> ddb{0}> show locks shared rwlock pfstates r = 0 (0x8245cc00) #0 witness_lock+0x311 #1 pfsync_bulk_update+0x45 #2 timeout_run+0x93 #3 softclock_thread+0x11d #4 proc_trampoline+0x1c exclusive rwlock netlock r = 0 (0x82454b38) #0 witness_lock+0x311 #1 rw_enter+0x292 #2 pfsync_bulk_update+0x29 #3 timeout_run+0x93 #4 softclock_thread+0x11d #5 proc_trampoline+0x1c exclusive kernel_lock _lock r = 1 (0x8252b258) #0 witness_lock+0x311 #1 __mp_acquire_count+0x38 #2 mi_switch+0x28b #3 sleep_finish+0xfe #4 rw_enter+0x232 #5 pfsync_bulk_update+0x29 #6 timeout_run+0x93 #7 softclock_thread+0x11d #8 proc_trampoline+0x1c shared rwlock timeout r = 0 (0x8244c9c8) #0 witness_lock+0x311 #1 timeout_run+0x88 #2 softclock_thread+0x11d #3 proc_trampoline+0x1c ddb{0}> ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 75873 445724 20843 68 3 0x190 kqreadisakmpd 20843 31033 1 0 30x80 netio isakmpd 76865 283351 1 0 30x10008b sigsusp ksh 43091 324769 1 0 30x100098 kqreadcron 92254 264061 28601 95 3 0x1100092 kqreadsmtpd 80520 324180 28601103 3 0x1100092 kqreadsmtpd 12107 295529 28601 95 3 0x1100092 kqreadsmtpd 89174 344742 28601 95 30x100092 kqreadsmtpd 50810 389490 28601 95 3 0x1100092 kqreadsmtpd 75581 433356 28601 95 3 0x1100092 kqreadsmtpd 28601 432136 1 0 30x100080 kqreadsmtpd 67099 85178 1 0 30x88 kqreadsshd 35272 270521 29963 68 7 0x110sasyncd 29963 124841 1 0 30x80 kqreadsasyncd 27546 425204 1 0 30x100080 kqreadntpd 88920 144011 29553 83 30x100092 kqreadntpd 295532629 1 83 3 0x1100092 kqreadntpd 25414 252219 66731 73 3 0x1100090 kqreadsyslogd 667319587 1 0 30x100082 netio syslogd 13849 243057 0 0 3 0x14200 bored smr 15866 463556 0 0 3 0x14200 pgzerozerothread 29043 244190 0 0 3 0x14200 aiodoned aiodoned 50284 435047 0 0 3 0x14200 syncerupdate 91848 147363 0 0 3
Re: pf_state_export crash
On 26.12.2022. 5:06, Csillag Tamas wrote: > hi, > > the crash repeated again > > uvm_fault(0x823ed470, 0x0, 0, 1) -> e > fatal page fault in supervisor mode > trap type 6 code 0 rip 81e4c208 cs 8 rflags 10246 cr2 0 cpl 0 rsp > 8000225a4060 > gsbase 0x80001e119ff0 kgsbase 0x0 > panic: trap type 6, code=0, pc=81e4c208 > Starting stack trace... > panic(81f27c9e) at panic+0x12c > kerntrap(8000225a3fb0) at kerntrap+0x114 > alltraps_kern_meltdown() at alltraps_kern_meltdown+0x7b > pf_state_export(fd8002e8a1c0,fd952aa907e0) at pf_state_export+0x38 > pfsync_sendout() at pfsync_sendout+0x5e4 > pfsync_update_state(fd9526f4f3f0) at pfsync_update_state+0x15b > pf_test(2,1,81eb5800,8000225a4438) at pf_test+0x117a > ip_input_if(8000225a4438,8000225a,4,0,81eb5800) at > ip_input_if+0xcd > ipv4_input(81eb5800,fd805f671000) at ipv4_input+0x39 > ether_input(81eb5800,fd805f671000) at ether_input+0x3b1 > carp_input(81ecd800,fd805f671000,5e000101) at carp_input+0x196 > ether_input(81ecd800,fd805f671000) at ether_input+0x1d9 > if_input_process(814e3050,8000225a4618) at if_input_process+0x6f > ifiq_process(814e5700) at ifiq_process+0x69 > taskq_thread(80037200) at taskq_thread+0x100 > end trace frame: 0x0, count: 242 > End of stack trace. > > Regards, > Tamas Hi, can you upgrade to latest snapshot with sysupgrade? If that won't solve your panic can you try this diff https://www.mail-archive.com/tech@openbsd.org/msg72582.html this was my panic https://www.mail-archive.com/bugs@openbsd.org/msg18583.html and that diff solved it ...
Re: Random kernel panic on 7.2
On 22.11.2022. 18:48, Josmar Pierri wrote: > I upgraded to 7.2 snapshot #849 early this morning, but it crashed > twice in a few hours. > This time, however, the panic message is different: > Could you compile kernel with this diff https://www.mail-archive.com/tech@openbsd.org/msg72582.html at least for me, that diff makes my firewall stable.. > uvm_fault(0x8236dcb8, 0x17, 0, 2) -> e > kernel: page fault trap, code=0 > Stopped at pfsync_q_del+0x96:movq %rdx,0x8(%rax) > TID PID UID PRFLAGS PFLAGS CPU COMMAND > 436110 83038 0 0x14000 0x200 3 softnet > 395295 39926 0 0x14000 0x200 0 softnet > 189958 2208 0 0x14000 0x200 2 softnet > * 658395423 0 0x14000 0x200 1 systqmp > pfsync_q_del(fd8401d63890) at pfsync_q_del+0x96 > pfsync_delete_state(fd8401d63890) at pfsync_delete_state+0x118 > pf_remove_state(fd8401d63890) at pfsync_remove_state+0x14b > pf_purge_expired_states(4031,40) at pf_purge_expired_states+0x242 > pf_purge_states(0) at pf_purge_states+0x1c > taskq_thread(822a1a10) at taskq_thread+0x100 > end trace frame: 0x0, count: 9 > > This is all I could manage to get since the crash happened when I was > away (and that stupid Dell console timeout when idle, removing the USB > keyboard) > > I observed a thing that may or may not be related to this issue: The > "output fail" counter keeps steadily increasing both on aggregate and > the two member interfaces: > > :~# netstat -i -I aggr0 > NameMtu Network Address Ipkts IfailOpkts Ofail > Colls > aggr0 9200fe:e1:ba:d0:91:13 224426940 0 200785282 > 357 0 > > At first I thought it could be something related to the switches but I > still haven't found anything wrong with them. > > > > On Mon, Nov 21, 2022 at 1:22 PM Hrvoje Popovski wrote: >> >> On 21.11.2022. 16:04, Josmar Pierri wrote: >>> Hi, >>> >>> I managed to get screenshots of a random kernel panic that we are >>> having on a server here. >>> They were taken using a console management tool embedded into the >>> server (Dell IDRAC) and are PNG images of the panic itself, trace of >>> all cpus and ps. >>> I'm not attaching them here right now because I don't know how the >>> list would react to them. >>> >>> I attached the output of: >>> 1 - sendbug -P >>> 2 - dmesg right after reboot >>> 3 - dmesg-boot >>> >>> This server has an aggr0 grouping bnxt0 and bnxt1, both at 10 Gbps. >>> Its task is to load-balance RDP traffic (TCP 3389) among 2 large pools >>> (more than 50 servers on each one) and 3 small ones using pf (tables) >>> for that. >>> >>> These panics happen at random times without an apparent cause. >>> >>> The panic message reads: >>> >>> ddb{3}> show panic >>> *cpu3: kernel diagnostic assertion "st->snapped == 0" failed: file >>> "/usr/src/sys/net/if_pfsync.c", line 1591 >>> cpu2: kernel diagnostic assertion "st->snapped == 0" failed: file >>> "/usr/src/sys/net/if_pfsync.c", line 1591 >>> cpu1: kernel diagnostic assertion "st->snapped == 0" failed: file >>> "/usr/src/sys/net/if_pfsync.c", line 1591 >>> ddb{3}> >>> >>> Please advise how I should proceed to submit the screenshots. >> >> Hi, >> >> I have similar setup with aggr grouping ix0 and ix1 and pfsync. If you >> have two firewalls, can you sysupgrade this one to latest snapshot ? >> >> I'm running snapshot after last hackathon with this diff >> https://www.mail-archive.com/tech@openbsd.org/msg72582.html >> >> and for now firewall seems to work just fine. >> >> >> >
Re: pfsync panic in pfsync_insert_state - syspatch?
On 21.11.2022. 14:26, Damjan Dimitrov wrote: > One thing I forgot to mention, these clusters also run ipsec. > I attach another stack-trace from a different node. > Thx. I'm sure this is not a solution, but could you increase the number of CPUs to more than 4, for example 6 or 8? I think it could prolong the frequency of panic a little ... I'm just curious ...
Re: Random kernel panic on 7.2
On 21.11.2022. 16:04, Josmar Pierri wrote: > Hi, > > I managed to get screenshots of a random kernel panic that we are > having on a server here. > They were taken using a console management tool embedded into the > server (Dell IDRAC) and are PNG images of the panic itself, trace of > all cpus and ps. > I'm not attaching them here right now because I don't know how the > list would react to them. > > I attached the output of: > 1 - sendbug -P > 2 - dmesg right after reboot > 3 - dmesg-boot > > This server has an aggr0 grouping bnxt0 and bnxt1, both at 10 Gbps. > Its task is to load-balance RDP traffic (TCP 3389) among 2 large pools > (more than 50 servers on each one) and 3 small ones using pf (tables) > for that. > > These panics happen at random times without an apparent cause. > > The panic message reads: > > ddb{3}> show panic > *cpu3: kernel diagnostic assertion "st->snapped == 0" failed: file > "/usr/src/sys/net/if_pfsync.c", line 1591 > cpu2: kernel diagnostic assertion "st->snapped == 0" failed: file > "/usr/src/sys/net/if_pfsync.c", line 1591 > cpu1: kernel diagnostic assertion "st->snapped == 0" failed: file > "/usr/src/sys/net/if_pfsync.c", line 1591 > ddb{3}> > > Please advise how I should proceed to submit the screenshots. Hi, I have similar setup with aggr grouping ix0 and ix1 and pfsync. If you have two firewalls, can you sysupgrade this one to latest snapshot ? I'm running snapshot after last hackathon with this diff https://www.mail-archive.com/tech@openbsd.org/msg72582.html and for now firewall seems to work just fine.
panic with OpenBSD 7.2-current (GENERIC.MP) #846
Hi all, I've sysupgrade 64 core box and I'm getting kernel fault trap below OpenBSD 7.2-current (GENERIC.MP) #846: Sun Nov 20 09:43:16 MST 2022 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 549312065536 (523864MB) avail mem = 532646780928 (507971MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets kernel: protection fault trap, code=0 Stopped at memcpy+0x15:repe movsq (%rsi),%es:(%rdi) ddb{0}> trace memcpy(fe007e5c2010,2,2c6c28baa7240e66,fe007e5c2000,2,ff000) at memcpy+0x15 pmap_randomize_level(fe007e7dafe0,3,2c6c28baa7241299,8272d000,f e007e7da000,fe0) at pmap_randomize_level+0x215 pmap_randomize(dc9b195a5a0185e0,0,0,0,0,0) at pmap_randomize+0x1ca cpu_configure(bd8245a77341df16,0,0,8002d000,816fdfb0,82 733f00) at cpu_configure+0x20 main(0,0,0,0,0,1) at main+0x3a3 end trace frame: 0x0, count: -5 ddb{0}> ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND *0 0 -1 0 7 0x10200swapper I will leave box in ddb if something else is needed ...
Re: pf panic
On 29.8.2022. 20:01, Hrvoje Popovski wrote: > On 9.8.2022. 21:32, Hrvoje Popovski wrote: >> On 9.8.2022. 19:56, Alexandr Nedvedicky wrote: >>> this is a NULL pointer dereference panic. I think we've seen it few >>> months >>> back. patch below was applied to one of your test machines if I remember >>> correct. can you give it a try again to see if it will help? >>> >>> the change adds a mutex to pf_state structure to protect references >>> to keys attached to state. >>> >>> we also have to take into account a fact that pf_state_export() may be >>> presented with state which keys got detached. Hence we have to >>> skip such state when doing export. Therefore pf_state_export() >>> indicates a failure to hint caller whether data were written (success) >>> and we should move to next free slot in output buffer. Or nothing >>> got written (failure) and current slot in output buffer is still free. >> >> Hi, >> >> this diff is applied to firewall and I will monitor it. >> >> Thank you ... >> > > Hi, > > after 20 days with this diff firewall seems stable. Problem is that last > time firewall was up for long time too, and I'm not sure what triggered > that panic. I will update that firewall to latest snapshot, apply that > diff and wait... > > Hi, after month or so with this diff firewall didn't panic.
Re: pf panic
On 9.8.2022. 21:32, Hrvoje Popovski wrote: > On 9.8.2022. 19:56, Alexandr Nedvedicky wrote: >> this is a NULL pointer dereference panic. I think we've seen it few >> months >> back. patch below was applied to one of your test machines if I remember >> correct. can you give it a try again to see if it will help? >> >> the change adds a mutex to pf_state structure to protect references >> to keys attached to state. >> >> we also have to take into account a fact that pf_state_export() may be >> presented with state which keys got detached. Hence we have to >> skip such state when doing export. Therefore pf_state_export() >> indicates a failure to hint caller whether data were written (success) >> and we should move to next free slot in output buffer. Or nothing >> got written (failure) and current slot in output buffer is still free. > > Hi, > > this diff is applied to firewall and I will monitor it. > > Thank you ... > Hi, after 20 days with this diff firewall seems stable. Problem is that last time firewall was up for long time too, and I'm not sure what triggered that panic. I will update that firewall to latest snapshot, apply that diff and wait...
Re: pflow - kernel: protection fault trap
On 10.8.2022. 15:44, Vitaliy Makkoveev wrote: > On Wed, Aug 10, 2022 at 12:52:06AM +0200, Hrvoje Popovski wrote: >> On 9.8.2022. 22:22, Vitaliy Makkoveev wrote: >>> Hi, >>> >>> The kernel lock within pflow_output_process() doesn't help because the >>> following sosend() has sleep points. So, at least pflow_clone_destroy() >>> should wait until pflow_output_process() finished. We should use >>> taskq_del_barrier(9) instead of task_del(9). >>> >>> Also we need to unlink dying pflow(4) interface from the stack before >>> start destruction. >>> >>> This diff should help. Please keep in mind, this diff is incomplete, >>> because it doesn't fix the race between pflowioctl() and >>> pflow_output_process(). This race is much more complicated, because we >>> need to introduce the new lock to protect `so' and take it before call >>> sosend(), but the sosend() takes netlock, which is taken before >>> pflowioctl() where we modify `so'. This introduces re-locking games to >>> pflowioctl() path, I so want to make this with separate diff, because >>> this potential panic was not triggered. >>> >> Hi, >> >> with this diff I'm getting this protection fault trap >> > taskq_del_barrier(9) has a bug and doesn't work as expected. This diff > uses taskq_barrier(9). > > According private Hrvoje report it fixes the problem. Hi, I'm was running ifconfig pflow0 destroy sleep 120 sh /etc/netstart pflow0 sleep 120 whole night and firewall didn't break. Without this diff if I run ifconfig pflow0 destroy and firewall is under pressure box got kernel fault trap immediately
Re: pflow - kernel: protection fault trap
On 10.8.2022. 0:49, Vitaliy Makkoveev wrote: > That's strange, because after we the only timeout handlers can reschedule > pflow_output_process to run, but they have no sleep points. However the > task handler still running after taskq_del_barrier(9). > > Does this help? Hi, this diff doesn't help. Here's output r620-1# ifconfig pflow0 destroy kernel: protection fault trap, code=0 Stopped at sblock+0x35:movq0x8(%rax),%rax ddb{4}> show panic the kernel did not panic ddb{4}> trace sblock(fd83b34818e8,fd83b3481a10,1) at sblock+0x35 sosend(fd83b34818e8,fd80cd292d00,0,fd80a3b4e200,0,0) at sosend+0x163 pflow_output_process(808ca000) at pflow_output_process+0x67 taskq_thread(80030100) at taskq_thread+0x100 end trace frame: 0x0, count: -4 ddb{4}> ddb{4}> show reg rdi 0xfd83b34818e8 rsi 0xfd83b3481a10 rbp 0x800022d66160 rbx0x501 rdx 0x1 rcx 0x8000e004 rax 0x12197a31cb9f19c7 r8 0x1 r90x821f4240rw_ops+0x10 r10 0x r11 0xd76a5b1e376e r120 r13 0x1 r14 0xfd83b3481a60 r15 0xfd83b34818e8 rip 0x8188cbf5sblock+0x35 cs 0x8 rflags 0x10246__ALIGN_SIZE+0xf246 rsp 0x800022d66110 ss 0x10 sblock+0x35:movq0x8(%rax),%rax ddb{4}> ddb{4}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 57222 30708 52109 0 7 0x3ifconfig 52109 165317 1 0 30x10008b sigsusp ksh 49245 507782 1 0 30x100098 kqreadcron 6217 82510 20690 95 3 0x1100092 kqreadsmtpd 82506 376421 20690103 3 0x1100092 kqreadsmtpd 82613 290075 20690 95 3 0x1100092 kqreadsmtpd 4815 308602 20690 95 30x100092 kqreadsmtpd 12941 472567 20690 95 3 0x1100092 kqreadsmtpd 23744 467673 20690 95 3 0x1100092 kqreadsmtpd 20690 84561 1 0 30x100080 kqreadsmtpd 76380 94838 1 0 30x88 kqreadsshd 86280 347923 1 0 30x100080 kqreadntpd 14359 243801 59957 83 30x100092 kqreadntpd 59957 263943 1 83 3 0x1100092 kqreadntpd 52207 492049 48201 73 3 0x1100090 kqreadsyslogd 48201 424791 1 0 30x100082 netio syslogd 25023 493390 0 0 3 0x14200 bored smr 49475 241893 0 0 3 0x14200 pgzerozerothread 35733 465768 0 0 3 0x14200 aiodoned aiodoned 44819 211641 0 0 3 0x14200 syncerupdate 12802 139258 0 0 3 0x14200 cleaner cleaner 77815 78998 0 0 3 0x14200 reaperreaper 97772 253526 0 0 3 0x14200 pgdaemon pagedaemon 20567 420970 0 0 3 0x14200 usbtskusbtask 81765 348189 0 0 3 0x14200 usbatsk usbatsk 58744 470980 0 0 3 0x40014200 acpi0 acpi0 42832 77958 0 0 7 0x40014200idle5 40468 474721 0 0 3 0x40014200idle4 98228 394491 0 0 7 0x40014200idle3 13842 58745 0 0 3 0x40014200idle2 87447 45776 0 0 7 0x40014200idle1 14520 516279 0 0 3 0x14200 bored sensors 20057 421224 0 0 3 0x14200 netlock softnet 681204487 0 0 3 0x14200 netlock softnet *15557 167519 0 0 7 0x14200softnet 57471 116257 0 0 3 0x14200 netlock softnet 21894 328074 0 0 3 0x14200 bored systqmp 36959 61819 0 0 3 0x14200 bored systq 29261 452739 0 0 3 0x40014200 bored softclock 26163 383919 0 0 7 0x40014200idle0 1 14140 0 0 30x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{4}> ddb{4}> ps /o TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 30708 57222 0 0x3 02 ifconfig *167519 15557 0 0x14000 0x2004K softnet ddb{4}> trace /t 0t30708 sleep_finish(800022e25200,1) at sleep_finish+0xfe rw_enter(822c58d8,1) at rw_enter+0x1cb if_detach(808ca000) at if_detach+0xda pflow_clone_destroy(808ca000) at pflow_clone_destroy+0x1a0 if_clone_destroy(800022e253c0) at
Re: pflow - kernel: protection fault trap
On 9.8.2022. 22:22, Vitaliy Makkoveev wrote: > Hi, > > The kernel lock within pflow_output_process() doesn't help because the > following sosend() has sleep points. So, at least pflow_clone_destroy() > should wait until pflow_output_process() finished. We should use > taskq_del_barrier(9) instead of task_del(9). > > Also we need to unlink dying pflow(4) interface from the stack before > start destruction. > > This diff should help. Please keep in mind, this diff is incomplete, > because it doesn't fix the race between pflowioctl() and > pflow_output_process(). This race is much more complicated, because we > need to introduce the new lock to protect `so' and take it before call > sosend(), but the sosend() takes netlock, which is taken before > pflowioctl() where we modify `so'. This introduces re-locking games to > pflowioctl() path, I so want to make this with separate diff, because > this potential panic was not triggered. > Hi, with this diff I'm getting this protection fault trap r620-1# ifconfig pflow0 destroy kernel: protection fault trap, code=0 Stopped at sblock+0x35:movq0x8(%rax),%rax ddb{0}> show panic the kernel did not panic ddb{0}> trace sblock(fd842c34d8e8,fd842c34da10,1) at sblock+0x35 sosend(fd842c34d8e8,fd80cd292800,0,fd80a3f37c00,0,0) at sosend+0x163 pflow_output_process(808ca000) at pflow_output_process+0x67 taskq_thread(80030100) at taskq_thread+0x100 end trace frame: 0x0, count: -4 ddb{0}> ddb{0}> show reg rdi 0xfd842c34d8e8 rsi 0xfd842c34da10 rbp 0x800022d66710 rbx0x501 rdx 0x1 rcx 0x8000ea84 rax 0x9f3ebe5199894262 r8 0x1 r90x821c7080rw_ops+0x10 r10 0x r11 0x6db1a912181c98f1 r120 r13 0x1 r14 0xfd842c34da60 r15 0xfd842c34d8e8 rip 0x81d71565sblock+0x35 cs 0x8 rflags 0x10246__ALIGN_SIZE+0xf246 rsp 0x800022d666c0 ss 0x10 sblock+0x35:movq0x8(%rax),%rax ddb{0}> ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 1364 367790 19987 0 7 0x3ifconfig 19987 130981 1 0 30x10008b sigsusp ksh 74340 115416 1 0 30x100098 kqreadcron 68578 240636 2156 95 3 0x1100092 kqreadsmtpd 86507 443747 2156103 3 0x1100092 kqreadsmtpd 47223 261838 2156 95 3 0x1100092 kqreadsmtpd 38121 503884 2156 95 30x100092 kqreadsmtpd 29539 133065 2156 95 3 0x1100092 kqreadsmtpd 83786 266601 2156 95 3 0x1100092 kqreadsmtpd 2156 411192 1 0 30x100080 kqreadsmtpd 62749 20828 1 0 30x88 kqreadsshd 85488 424702 1 0 30x100080 kqreadntpd 4633 197093 51224 83 30x100092 kqreadntpd 51224 139274 1 83 7 0x1100012ntpd 19966 136109 61788 73 3 0x1100090 kqreadsyslogd 61788 27725 1 0 30x100082 netio syslogd 31851 123130 0 0 3 0x14200 bored smr 12870 490593 0 0 3 0x14200 pgzerozerothread 51010 283420 0 0 3 0x14200 aiodoned aiodoned 69180 131489 0 0 3 0x14200 syncerupdate 36711 165342 0 0 3 0x14200 cleaner cleaner 75263 504085 0 0 3 0x14200 reaperreaper 72069 133609 0 0 3 0x14200 pgdaemon pagedaemon 99378 234898 0 0 3 0x14200 usbtskusbtask 30200 405105 0 0 3 0x14200 usbatsk usbatsk 96366 324880 0 0 3 0x40014200 acpi0 acpi0 24969 140748 0 0 7 0x40014200idle5 95045 386153 0 0 3 0x40014200idle4 72849 289914 0 0 7 0x40014200idle3 49815 213569 0 0 3 0x40014200idle2 39848 84701 0 0 3 0x40014200idle1 43651 137149 0 0 7 0x40014200sensors 10764 419906 0 0 3 0x14200 netlock softnet 51829 300708 0 0 3 0x14200 netlock softnet *58674 303202 0 0 7 0x14200softnet 60899 100126 0 0 3 0x14200 netlock softnet 49625 511441 0 0 3 0x14200 bored systqmp 5435 16476 0 0 3 0x14200 bored systq 8069 217014 0 0 2 0x40014200
Re: pf panic
On 9.8.2022. 19:56, Alexandr Nedvedicky wrote: > this is a NULL pointer dereference panic. I think we've seen it few months > back. patch below was applied to one of your test machines if I remember > correct. can you give it a try again to see if it will help? > > the change adds a mutex to pf_state structure to protect references > to keys attached to state. > > we also have to take into account a fact that pf_state_export() may be > presented with state which keys got detached. Hence we have to > skip such state when doing export. Therefore pf_state_export() > indicates a failure to hint caller whether data were written (success) > and we should move to next free slot in output buffer. Or nothing > got written (failure) and current slot in output buffer is still free. Hi, this diff is applied to firewall and I will monitor it. Thank you ...
pflow - kernel: protection fault trap
Hi all, when sending lot of traffic over firewall with pflow and if I run ifconfig pflow0 destroy I'm getting kernel: protection fault trap. This is latest snapshot: OpenBSD 7.2-beta (GENERIC.MP) #677: Mon Aug 8 18:58:49 MDT 2022 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP r620-1# ifconfig pflow0 destroy kernel: protection fault trap, code=0 Stopped at in_nam2sin+0x29:cmpb$0x2,0x1(%rdx) ddb{2}> show panic the kernel did not panic ddb{2}> trace in_nam2sin(fd80cd292b00,800022d66028) at in_nam2sin+0x29 udp_output(fd83b2c1ba00,fd80a3abf800,fd80cd292b00,0) at udp_output+0xcc sosend(fd83b2c1c558,fd80cd292b00,0,fd80a3abf800,0,0) at sosend+0x385 pflow_output_process(808ca000) at pflow_output_process+0x67 taskq_thread(80030100) at taskq_thread+0x100 end trace frame: 0x0, count: -5 ddb{2}> ddb{2}> show reg rdi 0xfd80cd292b00 rsi 0x800022d66028 rbp 0x800022d65ff0 rbx0 rdx 0x4a1336b5a404c64e rcx 0xce2fdf4a rax 0x2f r8 0x5b8 r9 0 r10 0x r11 0x3b190b40737cbe31 r12 0xfd80a3abf800 r13 0x28 r140x5b8 r15 0xfd83b2c1ba00 rip 0x81e494f9in_nam2sin+0x29 cs 0x8 rflags 0x10286__ALIGN_SIZE+0xf286 rsp 0x800022d65fe0 ss 0x10 in_nam2sin+0x29:cmpb$0x2,0x1(%rdx) ddb{2}> ddb{2}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 97584 114469 17291 0 7 0x3ifconfig 17291 147440 1 0 30x10008b sigsusp ksh 61667 371523 1 0 30x100098 kqreadcron 62419 388523 46193 95 3 0x1100092 kqreadsmtpd 43433 312290 46193103 3 0x1100092 kqreadsmtpd 45389 509524 46193 95 3 0x1100092 kqreadsmtpd 68113 112694 46193 95 30x100092 kqreadsmtpd 12544 45817 46193 95 3 0x1100092 kqreadsmtpd 35310 168879 46193 95 3 0x1100092 kqreadsmtpd 46193 474443 1 0 30x100080 kqreadsmtpd 66976 365265 1 0 30x88 kqreadsshd 45262 438619 1 0 30x100080 kqreadntpd 23411 270550 91687 83 30x100092 kqreadntpd 91687 425806 1 83 3 0x1100092 kqreadntpd 87999 345906 43 73 3 0x1100090 kqreadsyslogd 43 197785 1 0 30x100082 netio syslogd 53263 391295 0 0 3 0x14200 bored smr 53027 160140 0 0 3 0x14200 pgzerozerothread 93436 395928 0 0 3 0x14200 aiodoned aiodoned 6422 376977 0 0 3 0x14200 syncerupdate 12666 145796 0 0 3 0x14200 cleaner cleaner 5339 104878 0 0 3 0x14200 reaperreaper 18437 379590 0 0 3 0x14200 pgdaemon pagedaemon 95609 15815 0 0 3 0x14200 usbtskusbtask 34720 188775 0 0 3 0x14200 usbatsk usbatsk 28283 197132 0 0 3 0x40014200 acpi0 acpi0 32308 129369 0 0 7 0x40014200idle5 91423 465223 0 0 7 0x40014200idle4 82830 201537 0 0 7 0x40014200idle3 72849 294469 0 0 3 0x40014200idle2 82591 160582 0 0 3 0x40014200idle1 19010 51380 0 0 3 0x14200 bored sensors 46387 318985 0 0 3 0x14200 netlock softnet 72266 368671 0 0 3 0x14200 netlock softnet *31740 217354 0 0 7 0x14200softnet 63482 377439 0 0 3 0x14200 netlock softnet 66088 38816 0 0 3 0x14200 bored systqmp 72341 421031 0 0 3 0x14200 bored systq 43727 54109 0 0 3 0x40014200 bored softclock 4948 138264 0 0 7 0x40014200idle0 1 135757 0 0 30x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{2}> ps /o TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 114469 97584 0 0x3 01 ifconfig *217354 31740 0 0x14000 0x2002K softnet ddb{2}> ddb{2}> trace /t 0t114469 sleep_finish(800022e258d0,1) at sleep_finish+0xfe rw_enter(822b5b90,1) at rw_enter+0x1cb soclose(fd83b2c1c558,80) at soclose+0x27
pf panic
Hi all, I'm running OpenBSD 7.2-beta (GENERIC.MP) #651: Tue Jul 26 23:11:26 MDT 2022 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP on production firewall and for few weeks it was stable. Firewall panic today and I will sysupgrade it, but maybe this panic message is interesting so I'm sending it here. bcbnfw1# uvm_fault(0x823a1a20, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at pf_state_export+0x38: movq0(%rax),%rcx TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 309438 83954 0 0x14000 0x2001 softnet 486408 53515 0 0x14000 0x2003 softnet * 80122 54608 0 0x14000 0x2002 softnet pf_state_export(fd806152f9dc,fd8664eb12b0) at pf_state_export+0x38 pfsync_sendout() at pfsync_sendout+0x5e4 pfsync_update_state(fd8728968d40) at pfsync_update_state+0x15b pf_test(2,1,80bbb000,800020c336d8) at pf_test+0x117a ip_input_if(800020c336d8,800020c336e4,4,0,80bbb000) at ip_input_if+0xcd ipv4_input(80bbb000,fd80661d5300) at ipv4_input+0x39 ether_input(80bbb000,fd80661d5300) at ether_input+0x3b1 carp_input(80bd2000,fd80661d5300,5e000101) at carp_input+0x196 ether_input(80bd2000,fd80661d5300) at ether_input+0x1d9 vlan_input(80b9d000,fd80661d5300,800020c3390c) at vlan_input+0x23d ether_input(80b9d000,fd80661d5300) at ether_input+0x85 if_input_process(8048b048,800020c339a8) at if_input_process+0x6f ifiq_process(8048ea00) at ifiq_process+0x69 taskq_thread(80035080) at taskq_thread+0x100 end trace frame: 0x0, count: 1 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{2}> show reg rdi 0xfd806152fae4 rsi0 rbp 0x800020c33340 rbx0x19c rdx 0x4 rcx0 rax0 r8 0x104 r9 0x7d788a8c5153bdc r10 0x92a5ce4f38be8823 r11 0xfd806152f9dc r12 0xfd8664eb12b0 r130 r14 0xfd806152f9dc r15 0xfd8664eb12b0 rip 0x81387678pf_state_export+0x38 cs 0x8 rflags 0x10246__ALIGN_SIZE+0xf246 rsp 0x800020c33300 ss 0x10 pf_state_export+0x38: movq0(%rax),%rcx ddb{2}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 5515 239236 1 0 30x100083 ttyin ksh 46351 180485 1 0 30x100098 kqreadcron 670259485 68290720 3 0x190 kqreadlldpd 68290 377807 1 0 30x80 netio lldpd 74149 64334 55708 95 3 0x1100092 kqreadsmtpd 77756 107926 55708103 3 0x1100092 kqreadsmtpd 96682 419793 55708 95 3 0x1100092 kqreadsmtpd 95361 134736 55708 95 30x100092 kqreadsmtpd 17548 16395 55708 95 3 0x1100092 kqreadsmtpd 9493 444926 55708 95 3 0x1100092 kqreadsmtpd 55708 424253 1 0 30x100080 kqreadsmtpd 3986 219916 1 77 3 0x1100090 kqreaddhcpd 29833 112637 1 0 30x100080 kqreadsnmpd 99415 374613 1 91 3 0x192 kqreadsnmpd 94378 355183 1 0 30x88 kqreadsshd 95447 307241 1 0 30x100080 kqreadntpd 55599 503746 7240 83 30x100092 kqreadntpd 7240 502064 1 83 3 0x1100092 kqreadntpd 96225 207770 58673 74 3 0x1100092 bpf pflogd 58673 266584 1 0 30x80 netio pflogd 56880 475875 37876 73 3 0x1100090 kqreadsyslogd 37876 114860 1 0 30x100082 netio syslogd 77675 225215 0 0 3 0x14200 bored smr 24420 32069 0 0 3 0x14200 pgzerozerothread 40785 164275 0 0 3 0x14200 aiodoned aiodoned 3250 15093 0 0 3 0x14200 syncerupdate 71159 338127 0 0 3 0x14200 cleaner cleaner 45614 132741 0 0 3 0x14200 reaperreaper 17965 161362 0 0 3 0x14200 pgdaemon pagedaemon 70681 34263 0 0 3 0x14200 usbtskusbtask 30654 291134 0 0 3 0x14200 usbatsk usbatsk 22566 258438 0 0 3 0x40014200 acpi0 acpi0 65828 69579 0 0 7 0x40014200idle5 61839 98119 0 0 7 0x40014200idle4
Re: [External] : Re: PF hangs when doing NAT round-robin
On 18.7.2022. 10:40, Alexandr Nedvedicky wrote: > hello, > > On Sun, Jul 17, 2022 at 11:52:21PM +0200, Hrvoje Popovski wrote: >> On 17.7.2022. 20:19, Alexandr Nedvedicky wrote: >>> So in case 49/27 we are supposed to be selecting addresses: >>> 49.0.0.1, 49.0.0.2, ..., 49.0.0.30, 49.0.0.1 >>> we need to make sure selection mechanism skips network >>> address (49.0.0.0) and network broadcast address (49.0.0.31). >> >> Hi, >> >> I'm I understanding you correctly? If doing NAT to some route, let's say >> /30 (4 addresses) with this diff I will doing NAT only to 2 addresses? >> >> >> >> > > yes. I believe this should be correct. but I would like to > get it confirmed with someone who is stronger in network protocols. > > > > if we have a prafix /30 then the hosts we can address are: > .1 > .2 > the host part .3 should be a network broadcast. let's assume > we have something like: 192.168.1.8/30, then the network > broadcast address will be 192.168.1.11 > > to be honest I'm not sure if we can assign address > 192.168.8.0 to any host in that network. In my carp/pfsync setup's I have /30 route for NAT and I would like to NAT to all 4 addresses. If that's not possible, is right way to do NAT create table and list there ip by ip ?
Re: PF hangs when doing NAT round-robin
On 17.7.2022. 20:19, Alexandr Nedvedicky wrote: > So in case 49/27 we are supposed to be selecting addresses: > 49.0.0.1, 49.0.0.2, ..., 49.0.0.30, 49.0.0.1 > we need to make sure selection mechanism skips network > address (49.0.0.0) and network broadcast address (49.0.0.31). Hi, I'm I understanding you correctly? If doing NAT to some route, let's say /30 (4 addresses) with this diff I will doing NAT only to 2 addresses?
ure - ure0: usb errors on rx: IOERROR
Hi all, I have supermicro server with usb ure0: RTL8153 (0x5c30), address 00:e0:4b:68:84:de used for ssh and management. When doing ifconfig ure0 down / up I always getting this error smc24# ifconfig ure0 down smc24# ifconfig ure0 up smc24# ure0: usb errors on rx: IOERROR ure0: usb error on tx: IOERROR ure0: usb error on tx: IN_PROGRESS ure0: usb error on tx: TIMEOUT ure0: usb error on tx: IN_PROGRESS usb_insert_transfer: xfer=0xfd904e5cc7a8 not free after that ure0 is unusable and i need to reboot server, maybe just to remove usb and attach it again, but server is not near me. I've compiled kernel with option URE_DEBUG option USB_DEBUG option UHUB_DEBUG ure0: flags=8807 mtu 1500 lladdr 00:e0:4b:68:84:de index 11 priority 0 llprio 3 groups: egress media: Ethernet autoselect (1000baseT full-duplex) status: active inet X netmask 0xffe0 broadcast X kstat ure0:0:rxq:0 packets: 554 packets bytes: 45661 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets ure0:0:txq:0 packets: 144 packets bytes: 10962 bytes qdrops: 0 packets errors: 0 packets qlen: 0 packets maxqlen: 256 packets oactive: false dmesg smc24$ dmesg OpenBSD 7.1-current (GENERIC.MP) #5: Mon Jul 11 18:29:35 CEST 2022 r...@smc24.srce.hr:/sys/arch/amd64/compile/GENERIC.MP real mem = 68497002496 (65323MB) avail mem = 66403713024 (63327MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.3 @ 0xa9d1c000 (71 entries) bios0: vendor American Megatrends Inc. version "2.3" date 10/20/2021 bios0: Supermicro AS -1114S-WTRT acpi0 at bios0: ACPI 6.0 acpi0: sleep states S0 S5 acpi0: tables DSDT FACP SSDT SPMI SSDT FIDT MCFG SSDT SSDT BERT HPET IVRS PCCT SSDT CRAT CDIT SSDT WSMT APIC ERST HEST acpi0: wakeup devices B000(S3) C000(S3) B010(S3) C010(S3) B030(S3) C030(S3) B020(S3) C020(S3) B100(S3) C100(S3) B110(S3) C110(S3) B130(S3) C130(S3) B120(S3) C120(S3) acpitimer0 at acpi0: 3579545 Hz, 32 bits acpimcfg0 at acpi0 acpimcfg0: addr 0xe000, bus 0-255 acpihpet0 at acpi0: 14318180 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: AMD EPYC 7413 24-Core Processor, 2650.33 MHz, 19-01-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 64b/line 8-way L2 cache, 32MB 64b/line 16-way L3 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges cpu0: apic clock running at 100MHz cpu0: mwait min=64, max=64, C-substates=1.1, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: AMD EPYC 7413 24-Core Processor, 2650.01 MHz, 19-01-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 64b/line 8-way L2 cache, 32MB 64b/line 16-way L3 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 2 (application processor) cpu2: AMD EPYC 7413 24-Core Processor, 2650.00 MHz, 19-01-01 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,PCID,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 512KB 64b/line 8-way L2 cache, 32MB 64b/line 16-way L3 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: AMD EPYC 7413 24-Core Processor, 2650.00 MHz, 19-01-01 cpu3:
PF hangs when doing NAT round-robin
Hi all, Chris Cappuccio kindly asked me if I can reproduce his PF NAT problem in lab. His mail: On my NAT cluster I recently switched from nat-to source-hash $key to nat-to round-robin and started getting hangs. The boxes would hang where the network would stop responding and one core would be at 100% with softnet. Commands at the console would hang like ifconfig or even reboot. I'm curious if you might be able to reproduce this in your own testing. It seems to take around 24 hours in my case but I also have pfsync and pflow turned on. I've manage to reproduce his problem and pf conf with which I can trigger hang is set skip on { lo ix2 } set limit states 500 match out on ix1 nat-to 49/27 round-robin #match out on ix1 nat-to 49/27 block pass I'm sending traffic from host connected to ix0 to host connected to ix1 and that host is default gateway. Traffic is 100 Kpps (very low) of UDP from random 16/16 port 9 to random 48/28 port 9. And I'm sniffing traffic on linux port connected to ix1. As box behaves exactly as Chris described. vmstat, top and ddb output are inline and in attachment. After 10 sec of traffic box stops to do NAT or forward traffic and at that point I'm immediately stopping traffic. immediately after traffic is stopped r620-1# vmstat -m | egrep "Name|pf" NameSize Requests FailInUse Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle pfrule 1360303 1 0 1 1 0 80 pfstate 336 4614590 461398 38450 0 38450 38450 0 80 pfstkey 120 6921840 692097 20973 0 20973 20973 0 80 pfstitem 24 6921840 692097 4170 0 4170 4170 0 80 pfruleitem16 2307260 230700 927 0 927 927 0 80 pfosfpen 112 7140 71421 02121 0 80 pfosfp40 7140 423 5 0 5 5 0 80 after 10 minutes r620-1# vmstat -m | egrep "Name|pf" NameSize Requests FailInUse Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle pfrule 1360303 1 0 1 1 0 80 pfstate 336 4614590 461398 38450 0 38450 38450 0 80 pfstkey 120 6921840 692097 20973 0 20973 20973 0 80 pfstitem 24 6921840 692097 4170 0 4170 4170 0 80 pfruleitem16 2307260 230700 927 0 927 927 0 80 pfosfpen 112 7140 71421 02121 0 80 pfosfp40 7140 423 5 0 5 5 0 80 It seems that pf states are never cleared and what I can see with tcpdump on host connected to ix1 is that pf nat only to 49.0.0.0. immediately after traffic is stopped PID TID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND 43797 410290 5400K 900K onproc/3 - 0:03 91.21% softnet 40810 512893 1000K 900K idle pf_lock 0:04 0.88% softnet 82004 472857 1000K 900K idle pf_lock 0:03 0.68% softnet 7921 447270 1000K 900K idle netlock 0:02 0.24% softnet after 10 minutes PID TID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND 43797 410290 6400K 900K onproc/3 - 0:03 99.02% softnet 17097 135479 -2200K 900K sleep/5 -14:38 0.00% idle5 8526 296669 2800K 900K onproc/0 -14:35 0.00% idle0 66780 349165 2800K 900K onproc/4 -14:34 0.00% idle4 26781 248132 2800K 900K onproc/1 -14:33 0.00% idle1 21078 476731 2800K 900K onproc/2 -14:04 0.00% idle2 39533 504239 -2200K 900K sleep/3 -11:47 0.00% idle3 40810 512893 1000K 900K idle pf_lock 0:04 0.00% softnet 82004 472857 1000K 900K idle pf_lock 0:03 0.00% softnet 7921 447270 1000K 900K idle netlock 0:02 0.00% softnet at this point if one wants to do ifconifg or pfctl -vsi or some other network command, that command hangs and box needs to be rebooted from idrac or you can drop to ddb :) r620-1# Stopped at db_enter+0x10: popq%rbp ddb{0}> trace db_enter() at db_enter+0x10 comintr(80082000) at comintr+0x2de intr_handler(800022d4d620,8007a080) at intr_handler+0x6e Xintr_ioapic_edge16_untramp() at Xintr_ioapic_edge16_untramp+0x18f acpicpu_idle() at acpicpu_idle+0x203 sched_idle(822a4ff0) at sched_idle+0x280 end trace frame: 0x0, count: -6 ddb{0}> ddb{0}> show reg rdi0x2f8 rsi0 rbp 0x800022d4d550 rbx 0x82252bf9__kernel_virt_to_phys+0x2252bf9 rdx0x2f8 rcx0x286 rax 0x82252b00kstat_pv_tree_RBT_INFO+0x10 r80x82364040w_locklistdata+0x330 r9
Re: bnxt panic
On 17.3.2022. 21:31, Alexander Bluhm wrote: > On Thu, Mar 17, 2022 at 01:01:11AM +0100, Hrvoje Popovski wrote: >> On 16.3.2022. 20:00, Hrvoje Popovski wrote: >>> Hi all, >>> >>> While opensbd box is under pressure and in that moment i run ifconfig >>> bnxt0 down i get panic... it's not every time and it's that easy to >>> trigger panic >>> >>> I'm sending traffic over ix interfaces and bnxt is for ssh and nothing >>> else. >>> >>> I've compiled kernel with "option BNXT_DEBUG" and put debug in >>> hostname.bnxt0 but i didn't saw any log regarding bnxt interfaces. >>> >>> I will try to trigger panic few more times and will post them here.. >> >> this is same panic but with snapshot kernel without debug options >> >> uvm_fault(0xfd904e3a9440, 0x0, 0, 1) -> e >> kernel: page fault trap, code=0 >> Stopped at bnxt_intr+0x195:movq0(%r14,%r12,1),%rbx >> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND >> *465591 26407 0 0x3 00K ifconfig >> bnxt_intr(802bc7c0) at bnxt_intr+0x195 >> intr_handler(800027d3d7b0,80269880) at intr_handler+0x6e >> Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f >> Xspllower() at Xspllower+0x19 >> softintr_dispatch(0) at softintr_dispatch+0xdc >> Xsoftclock() at Xsoftclock+0x1f >> bnxt_ioctl(802bc048,80206910,800027d3dae0) at bnxt_ioctl+0x165 >> ifioctl(fd8e5a4f13a8,80206910,800027d3dae0,800027d8da50) at >> ifioctl+0x92b >> soo_ioctl(fd904cc0b2e8,80206910,800027d3dae0,800027d8da50) >> at soo_ioctl+0x161 >> sys_ioctl(800027d8da50,800027d3dbf0,800027d3dc40) at >> sys_ioctl+0x2c4 >> syscall(800027d3dcb0) at syscall+0x374 >> Xsyscall() at Xsyscall+0x128 >> end of kernel >> end trace frame: 0x7f7faf30, count: 3 >> https://www.openbsd.org/ddb.html describes the minimum info required in >> bug reports. Insufficient info makes it difficult to find and fix bugs. > > I don't have the device and don't know the code. But other drivers > don't process rx and tx interrupts when the interface is not running. > > Maybe this helps. dlg@ and jmatthew@ should know better than me. > > bluhm > Hi all, is it good time to commit this diff? Thank you > Index: dev/pci/if_bnxt.c > === > RCS file: /data/mirror/openbsd/cvs/src/sys/dev/pci/if_bnxt.c,v > retrieving revision 1.36 > diff -u -p -r1.36 if_bnxt.c > --- dev/pci/if_bnxt.c 14 Mar 2022 23:41:42 - 1.36 > +++ dev/pci/if_bnxt.c 17 Mar 2022 20:26:49 - > @@ -1543,6 +1543,7 @@ bnxt_intr(void *xq) > { > struct bnxt_queue *q = (struct bnxt_queue *)xq; > struct bnxt_softc *sc = q->q_sc; > + struct ifnet *ifp = >sc_ac.ac_if; > struct bnxt_cp_ring *cpr = >q_cp; > struct bnxt_rx_queue *rx = >q_rx; > struct bnxt_tx_queue *tx = >q_tx; > @@ -1565,10 +1566,13 @@ bnxt_intr(void *xq) > bnxt_handle_async_event(sc, cmpl); > break; > case CMPL_BASE_TYPE_RX_L2: > - rollback = bnxt_rx(sc, rx, cpr, , , , > cmpl); > + if (ISSET(ifp->if_flags, IFF_RUNNING)) > + rollback = bnxt_rx(sc, rx, cpr, , , > + , cmpl); > break; > case CMPL_BASE_TYPE_TX_L2: > - bnxt_txeof(sc, tx, , cmpl); > + if (ISSET(ifp->if_flags, IFF_RUNNING)) > + bnxt_txeof(sc, tx, , cmpl); > break; > default: > printf("%s: unexpected completion type %u\n", >
Re: pf panic with clean snapshot (GENERIC.MP) #570
On 8.6.2022. 7:33, Hrvoje Popovski wrote: > On 8.6.2022. 0:42, Alexandr Nedvedicky wrote: >> Hello Hrvoje, >> >> >>> Hi, >>> >>> while booting with this diff I've got this log: >>> >>> starting early daemons: syslogd pflogd ntpdwitness: lock_object >>> uninitialized: 0xfd8785c81a >>> 90 >>> Starting stack trace... >>> witness_checkorder(fd8785c81a90,9,0) at witness_checkorder+0xad >>> mtx_enter(fd8785c81a80) at mtx_enter+0x34 >>> pf_remove_state(fd8785c81988) at pf_remove_state+0x1da >>> pfsync_in_del_c(fd80028977b0,c,2,2) at pfsync_in_del_c+0x9f >>> pfsync_input(800020b056e8,800020b056f4,f0,2) at pfsync_input+0x33c >>> ip_deliver(800020b056e8,800020b056f4,f0,2) at ip_deliver+0x103 >>> ip_local(800020b056e8,800020b056f4,fe007fff0220,0) at >>> ip_local+0x1b7 >>> ipintr() at ipintr+0x5f >>> if_netisr(0) at if_netisr+0xca >>> taskq_thread(80036000) at taskq_thread+0x11a >> thanks for quick test with pfsync. it has turned out I've forgot to >> initialize >> a pf_state::mtx in pfsync_state_import() function. >> >> below is updated diff, which should fix a stack trace reported by >> witness. > > Hi, > > yes, stack trace is gone with this diff. will leave it running for a > while to see if panic goes away ... > > Thank you ... > Hi, after 4 days of running this diff firewall seems stable. It should panic by now ..
Re: pf panic with clean snapshot (GENERIC.MP) #570
On 8.6.2022. 0:42, Alexandr Nedvedicky wrote: > Hello Hrvoje, > > >> Hi, >> >> while booting with this diff I've got this log: >> >> starting early daemons: syslogd pflogd ntpdwitness: lock_object >> uninitialized: 0xfd8785c81a >> 90 >> Starting stack trace... >> witness_checkorder(fd8785c81a90,9,0) at witness_checkorder+0xad >> mtx_enter(fd8785c81a80) at mtx_enter+0x34 >> pf_remove_state(fd8785c81988) at pf_remove_state+0x1da >> pfsync_in_del_c(fd80028977b0,c,2,2) at pfsync_in_del_c+0x9f >> pfsync_input(800020b056e8,800020b056f4,f0,2) at pfsync_input+0x33c >> ip_deliver(800020b056e8,800020b056f4,f0,2) at ip_deliver+0x103 >> ip_local(800020b056e8,800020b056f4,fe007fff0220,0) at >> ip_local+0x1b7 >> ipintr() at ipintr+0x5f >> if_netisr(0) at if_netisr+0xca >> taskq_thread(80036000) at taskq_thread+0x11a > thanks for quick test with pfsync. it has turned out I've forgot to > initialize > a pf_state::mtx in pfsync_state_import() function. > > below is updated diff, which should fix a stack trace reported by witness. Hi, yes, stack trace is gone with this diff. will leave it running for a while to see if panic goes away ... Thank you ...
Re: [External] : pf panic with clean snapshot (GENERIC.MP) #570
On 7.6.2022. 2:16, Alexandr Nedvedicky wrote: > Hello, > > below is a diff which hopes to fix the issue. Although diff is fairly > large the change itself is kind of straightforward. Let me briefly > explain what's going on here. Diff introduces a mutex to pf_state, > which protects array of keys (pf_state::key) bound to state. > > The panic which diff below hopes to fix is caused by a race between timer > thread, which expires state and pfsync dispatch task, which updates a peer. > According to data provided by Hrvoje we panic due to NULL pointer dereference > in pf_state_export(), which finds sk->key[] to be NULL. This may happen > because > purge state mechanism detaches state key from state under protection > of PF_STATE_LOCK, while pfsync dispatch task just keeps a reference to state > without using a PF_STATE_LOCK to access a state instance. > > In order to synchronize access to pf_statey::key between purge thread > and pfsync dispatch task diff below introduces pf_state::mtx. > pfsync uses pf_state::mtx to attempt to grab references to keys bound > to state, while purge task uses mtx to safely invalidate state keys > in pf_detach_state(). > > Such change requires pfsync(4) to deal with situation when state > got detached while waiting in dispatch queue to update a peer. > We have to a .write() operation on sync-queue to indicate a failure > so pfsync_sendout() will just skip the state when processing dispatch > queue. > > Also diff changes pf_state_key_detach() such caller must pass pointer to state > key instead of key index to be detached from state. It also requires caller > to > invalidate a state key entry in pf_state::key member. > > I've just smoked tested the diff _without_ pfsync. Hi, while booting with this diff I've got this log: starting early daemons: syslogd pflogd ntpdwitness: lock_object uninitialized: 0xfd8785c81a 90 Starting stack trace... witness_checkorder(fd8785c81a90,9,0) at witness_checkorder+0xad mtx_enter(fd8785c81a80) at mtx_enter+0x34 pf_remove_state(fd8785c81988) at pf_remove_state+0x1da pfsync_in_del_c(fd80028977b0,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(800020b056e8,800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(800020b056e8,800020b056f4,f0,2) at ip_deliver+0x103 ip_local(800020b056e8,800020b056f4,fe007fff0220,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(80036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. witness: lock_object uninitialized: 0xfd8786d61d50 Starting stack trace... witness_checkorder(fd8786d61d50,9,0) at witness_checkorder+0xad mtx_enter(fd8786d61d40) at mtx_enter+0x34 pf_remove_state(fd8786d61c48) at pf_remove_state+0x1da pfsync_in_del_c(fd80028d04e0,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(800020b056e8,800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(800020b056e8,800020b056f4,f0,2) at ip_deliver+0x103 ip_local(800020b056e8,800020b056f4,fe03,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(80036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. witness: lock_object uninitialized: 0xfd8786d61bd0 Starting stack trace... witness_checkorder(fd8786d61bd0,9,0) at witness_checkorder+0xad mtx_enter(fd8786d61bc0) at mtx_enter+0x34 pf_remove_state(fd8786d61ac8) at pf_remove_state+0x1da pfsync_in_del_c(fd80028d04e0,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(800020b056e8,800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(800020b056e8,800020b056f4,f0,2) at ip_deliver+0x103 ip_local(800020b056e8,800020b056f4,fe03,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(80036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. witness: lock_object uninitialized: 0xfd87846cebc8 Starting stack trace... witness_checkorder(fd87846cebc8,9,0) at witness_checkorder+0xad mtx_enter(fd87846cebb8) at mtx_enter+0x34 pf_remove_state(fd87846ceac0) at pf_remove_state+0x1da pfsync_in_del_c(fd8070be0450,c,2,2) at pfsync_in_del_c+0x9f pfsync_input(800020b056e8,800020b056f4,f0,2) at pfsync_input+0x33c ip_deliver(800020b056e8,800020b056f4,f0,2) at ip_deliver+0x103 ip_local(800020b056e8,800020b056f4,fe03,0) at ip_local+0x1b7 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0xca taskq_thread(80036000) at taskq_thread+0x11a end trace frame: 0x0, count: 247 End of stack trace. witness: lock_object uninitialized: 0xfd87846ce748 Starting stack trace... witness_checkorder(fd87846ce748,9,0) at witness_checkorder+0xad mtx_enter(fd87846ce738) at mtx_enter+0x34 pf_remove_state(fd87846ce640) at pf_remove_state+0x1da pfsync_in_del_c(fd8070be0450,c,2,2) at pfsync_in_del_c+0x9f
Re: pf panic with clean snapshot (GENERIC.MP) #570
On 6.6.2022. 12:45, Alexandr Nedvedicky wrote: > this is most likely identical to crash you've reported ?two weeks ago? > I can not find an email with it. oh yes, yes it's on tech@ with subject pf_state_export panic with NET_TASKQ=6 and stuff i've totally forgot about that report :) difference is that panic was with few diffs on top of NET_TASKQ=6, but this one is plain snapshot...
pf panic with clean snapshot (GENERIC.MP) #570
Hi, this is follow up mail from https://marc.info/?l=openbsd-tech=165450511622133=2 panic log: bcbnfw1# uvm_fault(0x822e5e48, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at pf_state_export+0x38: movq0(%rax),%rcx TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *186873 72386 0 0x14000 0x2001 softnet 177504 6658 0 0x14000 0x2004 softnet 39873 45066 0 0x14000 0x2003 softnet 212195 13588 0 0x14000 0x2002 softnet pf_state_export(fd80610b3bd4,fd87778f3010) at pf_state_export+0x38 pfsync_sendout() at pfsync_sendout+0x5e4 pfsync_update_state(fd874a5bd190) at pfsync_update_state+0x15b pf_test(2,1,80bbe000,800020b45b18) at pf_test+0xd53 ip_input_if(800020b45b18,800020b45b24,4,0,80bbe000) at ip_input_if+0xcd ipv4_input(80bbe000,fd8061062300) at ipv4_input+0x39 ether_input(80bbe000,fd8061062300) at ether_input+0x3ad carp_input(80bd5000,fd8061062300,5e000101) at carp_input+0x196 ether_input(80bd5000,fd8061062300) at ether_input+0x1d9 vlan_input(80ba1000,fd8061062300,800020b45d4c) at vlan_input+0x23d ether_input(80ba1000,fd8061062300) at ether_input+0x85 if_input_process(8048b048,800020b45de8) at if_input_process+0x6f ifiq_process(8048e900) at ifiq_process+0x69 taskq_thread(80035200) at taskq_thread+0x100 end trace frame: 0x0, count: 1 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{1}> ddb{1}> show reg rdi 0xfd80610b3cdc rsi0 rbp 0x800020b457a0 rbx0x394 rdx 0x4 rcx0 rax0 r8 0x104 r9 0x201641d4bc7bea8 r10 0xfa48834155c0359a r11 0xfd80610b3bd4 r12 0xfd87778f3010 r130 r14 0xfd80610b3bd4 r15 0xfd87778f3010 rip 0x81768b08pf_state_export+0x38 cs 0x8 rflags 0x10246__ALIGN_SIZE+0xf246 rsp 0x800020b45760 ss 0x10 pf_state_export+0x38: movq0(%rax),%rcx ddb{1}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 69799 138915 1 0 30x100083 ttyin ksh 30083 46 1 0 30x100098 kqreadcron 56402 311171 96066720 3 0x190 kqreadlldpd 96066 275425 1 0 30x80 netio lldpd 82432 242124 96039 95 3 0x1100092 kqreadsmtpd 12516 216897 96039103 3 0x1100092 kqreadsmtpd 19369 21427 96039 95 3 0x1100092 kqreadsmtpd 16547 5 96039 95 30x100092 kqreadsmtpd 40575 355715 96039 95 3 0x1100092 kqreadsmtpd 64566 206338 96039 95 3 0x1100092 kqreadsmtpd 96039 176140 1 0 30x100080 kqreadsmtpd 74078 507976 1 77 3 0x1100090 kqreaddhcpd 22909 489517 1 0 30x100080 kqreadsnmpd 49177 112109 1 91 3 0x192 kqreadsnmpd 97916 230895 1 0 30x88 kqreadsshd 16686 416523 1 0 30x100080 kqreadntpd 318405744 94041 83 30x100092 kqreadntpd 94041 139024 1 83 3 0x1100092 kqreadntpd 67241 440831 52217 74 3 0x1100092 bpf pflogd 52217 253016 1 0 30x80 netio pflogd 75377 97140 41241 73 3 0x1100090 kqreadsyslogd 41241 505035 1 0 30x100082 netio syslogd 33175 220087 0 0 3 0x14200 bored smr 59216 65103 0 0 3 0x14200 pgzerozerothread 93094 298208 0 0 3 0x14200 aiodoned aiodoned 4707 184791 0 0 3 0x14200 syncerupdate 13584 284481 0 0 3 0x14200 cleaner cleaner 86417 471845 0 0 3 0x14200 reaperreaper 78809 25532 0 0 3 0x14200 pgdaemon pagedaemon 32266 308574 0 0 3 0x14200 usbtskusbtask 1400 353498 0 0 3 0x14200 usbatsk usbatsk 85069 436856 0 0 3 0x40014200 acpi0 acpi0 73330 275126 0 0 7 0x40014200idle5 11953 135217 0 0 3 0x40014200idle4 20559 345946 0 0 3 0x40014200idle3 22822 186899 0 0 3 0x40014200idle2 52381 348951 0 0 3
Re: relayd panic
On 1.6.2022. 9:16, Alexandr Nedvedicky wrote: > Hello, > > >> r420-1# rcctl -f start relayd >> relayd(ok) >> r420-1# uvm_fault(0xfd862f82f990, 0x0, 0, 1) -> e >> kernel: page fault trap, code=0 >> Stopped at pf_find_or_create_ruleset+0x1c: movb0(%rdi),%al >> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND >> 431388 19003 0 0x2 05 relayd >> 174608 32253 89 0x112 02 relayd >> 395415 12468 0 0x2 04 relayd >> 493579 11904 0 0x2 03 relayd >> *101082 14967 89 0x1100012 00K relayd >> pf_find_or_create_ruleset(0) at pf_find_or_create_ruleset+0x1c >> pfr_add_tables(832d7cca800,1,80eaf43c,1000) at >> pfr_add_tables+0x6ae >> >> pfioctl(4900,c450443d,80eaf000,3,80002272e7f0) at pfioctl+0x1d9f >> VOP_IOCTL(fd8551f82dd0,c450443d,80eaf000,3,fd862f7d60c0,800 >> 02272e7f0) at VOP_IOCTL+0x5c >> vn_ioctl(fd855ecec1e8,c450443d,80eaf000,80002272e7f0) at >> vn_ioctl+0x75 >> sys_ioctl(80002272e7f0,8000227d9980,8000227d99d0) at >> sys_ioctl+0x2c4 >> syscall(8000227d9a40) at syscall+0x374 >> Xsyscall() at Xsyscall+0x128 >> end of kernel > it looks like we are dying here at line 239 due to NULL pointer deference: > > 232 struct pf_ruleset * > 233 pf_find_or_create_ruleset(const char *path) > 234 { > 235 char*p, *aname, *r; > 236 struct pf_ruleset *ruleset; > 237 struct pf_anchor*anchor; > 238 > 239 if (path[0] == 0) > 240 return (_main_ruleset); > 241 > 242 while (*path == '/') > 243 path++; > 244 > > I've followed the same steps to reproduce the issue to check if > diff below resolves the issue. The bug has been introduced by > my recent change to pf_table.c [1] from May 10th: > > Modified files: > sys/net: pf_ioctl.c pf_table.c > > Log message: > move memory allocations in pfr_add_tables() out of > NET_LOCK()/PF_LOCK() scope. bluhm@ helped a lot > to put this diff into shape. > > besides using a regression test I've also did simple testing > using a 'load anchor': > > netlock# cat /tmp/anchor.conf > > load anchor "test" from "/tmp/pf.conf" > netlock# > netlock# cat /tmp/pf.conf > > table { 192.168.1.1 } > pass from > netlock# > netlock# pfctl -sA > test > netlock# pfctl -a test -sT > try > netlock# pfctl -a test -t try -T show >192.168.1.1 > > OK to commit fix below? I'm confirming that with this diff i can't trigger panic...
Re: relayd panic
On 1.6.2022. 7:01, Hrvoje Popovski wrote: > Hi all, > > while playing around with TCP Large Receive Offloading for ix I have > configure httpd and relayd on test box. > Same second I've start relayd box panic. > This is latest snapshot and it easely reproduciable.. With WITNESS r420-1# rcctl -f start relayd relayd(ok) WuAvRm_NfINaGu:l t(S0PLx ffNfOTff LdO8W6E2fR8ED2 37O3N0 T,R 0AxP0 E,X 0I,T a1 )0 - > Stopped at proc_trampoline+0xdc: m ovl $0,%gs:0x538 TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 434783 78195 0 0x2 04 relayd 416901 1262 89 0x112 03 relayd 290632 38913 0 0x2 02 relayd 239447 37685 0 0x2 05 relayd 72623 6837 89 0x1100012 00K relayd *174940 41382 00x13 01 ksh proc_trampoline() at proc_trampoline+0xdc end of kernel end trace frame: 0x7f7dd400, count: 14 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{1}> ddb{1}> show panic *cpu0: uvm_fault(0xfd862f823730, 0x0, 0, 1) -> e ddb{1}> ddb{1}> show reg rdi 0x822c0d48kprintf_mutex rsi 0x5 rbp 0x8000227afea0 rbx0 rdx 0xc000 rcx0x286 rax 0x2a r8 0 r9 0 r100xf417d734fa974b8 r11 0x7ea5978c0be9feb6 r120 r130 r140 r150 rip 0x8118b50cproc_trampoline+0xdc cs 0x8 rflags 0x246 rsp 0x8000227afe20 ss 0 proc_trampoline+0xdc: movl$0,%gs:0x538 ddb{1}> ddb{1}> show all locks CPU 1: exclusive mutex >pm_mtx r = 0 (0xfd862f8226d8) #0 witness_lock+0x311 #1 mtx_enter_try+0x95 #2 mtx_enter+0x48 #3 pmap_enter+0xf8 #4 uvm_fault_upper+0x1e5 #5 uvm_fault+0xde #6 upageflttrap+0x62 #7 usertrap+0x129 #8 recall_trap+0x8 Process 37685 (relayd) thread 0x80002273f508 (239447) exclusive rwlock uobjlk r = 0 (0xfd8575064088) #0 witness_lock+0x311 #1 rw_enter+0x292 #2 uvm_fault_lower_lookup+0x41 #3 uvm_fault_lower+0x45 #4 uvm_fault+0x1b3 #5 upageflttrap+0x62 #6 usertrap+0x129 #7 recall_trap+0x8 shared rwlock vmmaplk r = 0 (0xfd862f823a28) #0 witness_lock+0x311 #1 uvmfault_lookup+0x8a #2 uvm_fault_check+0x32 #3 uvm_fault+0xfb #4 upageflttrap+0x62 #5 usertrap+0x129 #6 recall_trap+0x8 Process 6837 (relayd) thread 0x80002273f268 (72623) exclusive rwlock pf_lock r = 0 (0x822ce1f8) #0 witness_lock+0x311 #1 pfr_add_tables+0x384 #2 pfioctl+0x1daf #3 VOP_IOCTL+0x5c #4 vn_ioctl+0x75 #5 sys_ioctl+0x2c4 #6 syscall+0x374 #7 Xsyscall+0x128 exclusive rwlock netlock r = 0 (0x822adc60) #0 witness_lock+0x311 #1 pfr_add_tables+0x342 #2 pfioctl+0x1daf #3 VOP_IOCTL+0x5c #4 vn_ioctl+0x75 #5 sys_ioctl+0x2c4 #6 syscall+0x374 #7 Xsyscall+0x128 exclusive rwlock pfioctl_rw r = 0 (0x822ce258) #0 witness_lock+0x311 #1 pfioctl+0x21e #2 VOP_IOCTL+0x5c #3 vn_ioctl+0x75 #4 sys_ioctl+0x2c4 #5 syscall+0x374 #6 Xsyscall+0x128 exclusive kernel_lock _lock r = 1 (0x8247f570) #0 witness_lock+0x311 #1 vn_ioctl+0x3b #2 sys_ioctl+0x2c4 #3 syscall+0x374 #4 Xsyscall+0x128 Process 41382 (ksh) thread 0x80002273f7a8 (174940) exclusive rwlock amaplk r = 0 (0xfd857123cad0) #0 witness_lock+0x311 #1 uvm_fault_check+0x3f7 #2 uvm_fault+0xfb #3 upageflttrap+0x62 #4 usertrap+0x129 #5 recall_trap+0x8 shared rwlock vmmaplk r = 0 (0xfd857136d758) #0 witness_lock+0x311 #1 uvmfault_lookup+0x8a #2 uvm_fault_check+0x32 #3 uvm_fault+0xfb #4 upageflttrap+0x62 #5 usertrap+0x129 #6 recall_trap+0x8 exclusive mutex >pm_mtx r = 0 (0xfd862f8226d8) #0 witness_lock+0x311 #1 mtx_enter_try+0x95 #2 mtx_enter+0x48 #3 pmap_enter+0xf8 #4 uvm_fault_upper+0x1e5 #5 uvm_fault+0xde #6 upageflttrap+0x62 #7 usertrap+0x129 #8 recall_trap+0x8 ddb{1}> ddb{1}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 11599 104649 1 0 30x80 kqreadrelayd 61284 290693 1 0 2 0x2relayd 78195 434783 1 0 7 0x2relayd 51529 52072 1 89 2 0x112relayd 1262 416901 1 89 7 0x112relayd 38913 290632 1 0 7 0x2relayd 37685 239447 1 0 7 0x2relayd 59481 105452 1 0 2 0x2
relayd panic
Hi all, while playing around with TCP Large Receive Offloading for ix I have configure httpd and relayd on test box. Same second I've start relayd box panic. This is latest snapshot and it easely reproduciable.. r420-1# cat /etc/httpd.conf prefork 4 server "default" { listen on 127.0.0.1 port 80 } r420-1# cat /etc/relayd.conf table { 127.0.0.1 } redirect www { listen on 192.168.100.205 port http forward to check icmp } panic r420-1# rcctl -f start relayd relayd(ok) r420-1# uvm_fault(0xfd862f82f990, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at pf_find_or_create_ruleset+0x1c: movb0(%rdi),%al TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 431388 19003 0 0x2 05 relayd 174608 32253 89 0x112 02 relayd 395415 12468 0 0x2 04 relayd 493579 11904 0 0x2 03 relayd *101082 14967 89 0x1100012 00K relayd pf_find_or_create_ruleset(0) at pf_find_or_create_ruleset+0x1c pfr_add_tables(832d7cca800,1,80eaf43c,1000) at pfr_add_tables+0x6ae pfioctl(4900,c450443d,80eaf000,3,80002272e7f0) at pfioctl+0x1d9f VOP_IOCTL(fd8551f82dd0,c450443d,80eaf000,3,fd862f7d60c0,800 02272e7f0) at VOP_IOCTL+0x5c vn_ioctl(fd855ecec1e8,c450443d,80eaf000,80002272e7f0) at vn_ioctl+0x75 sys_ioctl(80002272e7f0,8000227d9980,8000227d99d0) at sys_ioctl+0x2c4 syscall(8000227d9a40) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7eca80, count: 7 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> ddb{0}> show reg rdi0 rsi 0x80eb2a01 rbp 0x8000227d8f70 rbx0 rdx0 rcx 0x3 rax 0x72 r8 0x101010101010101 r90x8080808080808080 r10 0xe7c5ac49b5b31a3e r11 0xd36a3af6ec2034e3 r12 0x1 r13 0x80eb2e00 r14 0x80eb39c0 r15 0x80eb2a00 rip 0x8147bffcpf_find_or_create_ruleset+0x1c cs 0x8 rflags 0x10282__ALIGN_SIZE+0xf282 rsp 0x8000227d8f30 ss 0x10 pf_find_or_create_ruleset+0x1c: movb0(%rdi),%al ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 70374 260289 1 0 30x80 kqreadrelayd 19003 431388 1 0 7 0x2relayd 32253 174608 1 89 7 0x112relayd 12468 395415 1 0 7 0x2relayd 11904 493579 1 0 7 0x2relayd 71579 177053 1 89 3 0x1100092 kqreadrelayd 52250 384601 1 89 3 0x1100092 kqreadrelayd 78736 288537 1 89 3 0x1100092 kqreadrelayd *14967 101082 1 89 7 0x1100012relayd 97366 28376 48265 0 30x100083 nanoslp sleep 48265 148003 1 0 30x100089 sigsusp ksh 72597 317981 1 0 30x100083 ttyin ksh 14238 266586 1 0 30x100098 kqreadcron 88363 270212 39693 95 3 0x1100092 kqreadsmtpd 43072 155751 39693103 3 0x1100092 kqreadsmtpd 20968 329586 39693 95 3 0x1100092 kqreadsmtpd 61100 508858 39693 95 30x100092 kqreadsmtpd 98465 158391 39693 95 3 0x1100092 kqreadsmtpd 12045 461090 39693 95 3 0x1100092 kqreadsmtpd 39693 153086 1 0 30x100080 kqreadsmtpd 2297 255527 1 0 30x88 kqreadsshd 73816 88254 1 0 30x100080 kqreadntpd 74329 300888 70971 83 30x100092 kqreadntpd 70971 124726 1 83 3 0x1100092 kqreadntpd 31879 226513 51900 74 3 0x1100092 bpf pflogd 51900 452501 1 0 30x80 netio pflogd 84934 332753 55410 73 3 0x1100090 kqreadsyslogd 55410 338332 1 0 30x100082 netio syslogd 42399 151525 0 0 3 0x14200 bored smr 51084 48313 0 0 3 0x14200 pgzerozerothread 55543 234427 0 0 3 0x14200 aiodoned aiodoned 38843 197586 0 0 3 0x14200 syncerupdate 39156 69723 0 0 3 0x14200 cleaner cleaner 28960 522155 0 0 3 0x14200 reaperreaper 98774 330824 0 0 3
Re: -current crash
On 1.6.2022. 0:27, Stuart Henderson wrote: > I accidentally updated a router to -current instead of 7.1 and hit this. > (Thanks sysupgrade - it was running a 7.0-stable kernel before...) > > Unfortunately it runs with ddb.panic=0 and this time it hanged, I won't > have time to figure anything out with it when I get it back online, but > might be able to do so later in the week. > > Thought I'd send it out now as a heads-up as much as anything (and maybe > someone has an idea). Boot messages below. Hi, I think that this is relayd panic. I'm seeing this too while testing TCP Large Receive Offloading. I will send proper bug report just in next mail... r420-1# rcctl -f start relayd relayd(ok) r420-1# uvm_fault(0xfd8571260e70, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at pf_find_or_create_ruleset+0x1c: movb0(%rdi),%al TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *307542 67712 89 0x1100012 02K relayd pf_find_or_create_ruleset(0) at pf_find_or_create_ruleset+0x1c pfr_add_tables(b0caf51,1,8104343c,1000) at pfr_add_tables+0x6ae pfioctl(4900,c450443d,81043000,3,80002271e550) at pfioctl+0x1daf VOP_IOCTL(fd857631f1f8,c450443d,81043000,3,fd862f7d6960,80002271e550) at VOP_IOCTL+0x5c vn_ioctl(fd854c7a2308,c450443d,81043000,80002271e550) at vn_ioctl+0x75 sys_ioctl(80002271e550,8000227f5b40,8000227f5b90) at sys_ioctl+0x2c4 syscall(8000227f5c00) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d3d40, count: 7 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs.
Re: [External] : Re: ip6 forwarding with pf and pfsync over veb/vport
On 24.5.2022. 9:01, Alexandr Nedvedicky wrote: > interesting. I went through mbuf handling in if_veb.c > I just could find a single nit, which is most likely unrelated, > however I think it's still worth to give it a try a diff below. > > basically all calls to veb_pf() read as follows: > m = veb_pf(ifp, ..., m); > except the one in veb_broadcast(), which readsa as: > m = veb_pf(ifp, ..., m0); > I think it is a bug, veb_pf() caller should continue to run > with packet returned by veb_pf(). > > thanks and > regards > sashan Hi, and with this diff i can panic box the same way as before... ip6 forwarding, pf and veb/vport panic: r620-1# panuicvm:_ f paoulotl(_0caxcffhfef_iftfeffm8_2ma2gfi13ca_c8h, e ck :m bu f p l cp uf r e0ex1 7 , l i 0s,t 2 ) - > e mkoedrnieflie: d : i t e m a dd r0 xf f f ff d 8 0 a 42 0 e 5 00 + 2 4 0x 6 a b 22 4 5 9 6 1e e 9 8 5c ! = 0 x 6 ab 2 2 4 5 9pcadge0 a f8 5 c Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 418374 46077 0 0x14000 0x2003 softnet 355064 80120 0 0x14000 0x2002K softnet *401307 69853 0 0x14000 0x2005 softnet db_enter() at db_enter+0x10 panic(81f3c6f5) at panic+0xbf pool_cache_get(82483608) at pool_cache_get+0x25b pool_get(82483608,2) at pool_get+0x61 m_get(2,1) at m_get+0x3f m_copym(fd80a3b50900,0,40,2) at m_copym+0xd8 ip6_forward(fd80a3b50900,fd842ce9c708,0) at ip6_forward+0x1cc ip6_input_if(800022c6b728,800022c6b734,29,0,8074b000) at ip6_input_if+0x80a ipv6_input(8074b000,fd80a3b50900) at ipv6_input+0x39 ether_input(8074b000,fd80a3b50900) at ether_input+0x3ad vport_if_enqueue(8074b000,fd80a3b50900) at vport_if_enqueue+0x19 veb_port_input(80095048,fd80a3b50900,ecf4bbdaf7f8,80747300) at veb_port_input+0x5b0 ether_input(80095048,fd80a3b50900) at ether_input+0x100 if_input_process(80095048,800022c6b938) at if_input_process+0x6f end trace frame: 0x800022c6b980, count: 0 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{5}> show panic *cpu5: pool_cache_item_magic_check: mbufpl cpu free list modified: item addr 0x fd80a420e500+24 0x6ab2245961ee985c!=0x6ab22459cd0af85c cpu2: uvm_fault(0x822f13a8, 0x17, 0, 2) -> e ddb{5}>
Re: ip6 forwarding with pf and pfsync over veb/vport
On 23.5.2022. 10:41, Hrvoje Popovski wrote: > On 23.5.2022. 8:34, Alexandr Nedvedicky wrote: >> looks like kind of memory corruption. my bet is use-after-free. >> will try to get to it later today. >> >> does it mean there is no such panic, when we handle IPv4 traffic only? > > Hi, > > yes, it seems that i can't trigger panic with ip4 only traffic, at least > the same way i can with ip6 traffic > All day I'm trying to trigger panic with ip4 and I just can't
Re: ip6 forwarding with pf and pfsync over veb/vport
On 23.5.2022. 10:41, Hrvoje Popovski wrote: > On 23.5.2022. 8:34, Alexandr Nedvedicky wrote: >> looks like kind of memory corruption. my bet is use-after-free. >> will try to get to it later today. >> >> does it mean there is no such panic, when we handle IPv4 traffic only? > > Hi, > > yes, it seems that i can't trigger panic with ip4 only traffic, at least > the same way i can with ip6 traffic > Here's another one but this time i've tcpdump outgoing ix interface. I've tried same stuff with ip4 traffic and couldn't trigger panic. 10:53:59.682513 a192:a168:a100::111.9 > b192:b168:b111::bfbf.9: udp puvamn_icf:au l t p(o0 oxflf_cfafcffhfe_fi82t2emf_62m6a8gi, c _ ch e c k : m b uf p l c p uf r 0exe1 l7i, s t m o d if i e d : i t e m a d d r 0 x ff f f f d8 0 a 37 f d a 0 0+ 1 60 xf f f ff d 8 0a 3 7 fd a f 2! = 0x c 0f1,8 9 2b)ec d f -5>9 b0 0 b Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 32710 85256 0 0x14000 0x2004K softnet 97437 83157 0 0x14000 0x2001 softnet 212200 25091 0 0x14000 0x2003 softnet 510395 50985 0 0x14000 0x2005 softnet 417502 88838 0 0x14000 0x2000 systq db_enter() at db_enter+0x10 panic(81f34fe0) at panic+0xbf pool_cache_get(82474c48) at pool_cache_get+0x25b pool_get(82474c48,2) at pool_get+0x61 m_clget(0,2,802) at m_clget+0xdd ixgbe_get_buf(800973a0,b2) at ixgbe_get_buf+0xa3 ixgbe_rxfill(800973a0) at ixgbe_rxfill+0xaa ixgbe_queue_intr(80024d00) at ixgbe_queue_intr+0x4f intr_handler(800022c89380,80081e00) at intr_handler+0x6e Xintr_ioapic_edge0_untramp() at Xintr_ioapic_edge0_untramp+0x18f acpicpu_idle() at acpicpu_idle+0x203 sched_idle(800022412ff0) at sched_idle+0x280 end trace frame: 0x0, count: 3 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{2}> ddb{2}> show panic cpu4: uvm_fault(0x822f6268, 0x17, 0, 2) -> e *cpu2: pool_cache_item_magic_check: mbufpl cpu free list modified: item addr 0x fd80a37fda00+16 0xfd80a37fdaf2!=0xcf189becdf59b00b ddb{2}> ddb{2}> show reg rdi0 rsi 0x14 rbp 0x800022c88ff0 rbx 0xfd842f835c00 rdx 0xc800 rcx0x206 rax 0x8a r8 0x101010101010101 r9 0 r10 0xe6540fc793a8e615 r11 0x4860824aa7540a0c r12 0x800022413a60 r130 r140 r15 0x81f34fe0cmd0646_9_tim_udma+0x2acb1 rip 0x817b4d90db_enter+0x10 cs 0x8 rflags 0x206 rsp 0x800022c88ff0 ss 0x10 db_enter+0x10: popq%rbp ddb{2}> show mbuf mbuf 0x817b4d90 m_type: -13108 m_flags: c3cc m_next: 0x1d3b4c241c334c5d m_nextpkt: 0x117400ae525c m_data: 0x m_len: 3435973836 m_dat: 0x817b4db0 m_pktdat: 0x817b4e00 ddb{2}> show all locks Process 85256 (softnet) thread 0x8000e7e0 (32710) shared rwlock netlock r = 0 (0x822e9990) shared rwlock softnet r = 0 (0x80031370) Process 83157 (softnet) thread 0x8000ea80 (97437) shared rwlock netlock r = 0 (0x822e9990) shared rwlock softnet r = 0 (0x80031270) Process 25091 (softnet) thread 0x8000ed20 (212200) shared rwlock netlock r = 0 (0x822e9990) shared rwlock softnet r = 0 (0x80031170) Process 50985 (softnet) thread 0x8000efc0 (510395) shared rwlock softnet r = 0 (0x80031070) Process 88838 (systq) thread 0x8000f500 (417502) shared rwlock systq r = 0 (0x822eaf08) Process 59744 (softclock) thread 0x8000f7a0 (200127) exclusive kernel_lock _lock r = 0 (0x824b03c0) shared rwlock timeout r = 0 (0x822b2fe8) ddb{2}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 81137 105065 65725 76 30x100093 netio tcpdump 65725 227707 17816 76 3 0x1100093 ttyouttcpdump 17816 349982 1 0 30x10008b sigsusp ksh 96985 429538 1 0 30x100098 kqreadcron 95498 144368 28860 95 3 0x1100092 kqreadsmtpd 43714 295842 28860103 3 0x1100092 kqreadsmtpd 80683 116687 28860 95 3 0x1100092 kqreadsmtpd 35950 130878 28860 95 30x100092 kqreadsmtpd 27765 48615 28860 95 3 0x1100092 kqreadsmtpd 554
Re: ip6 forwarding with pf and pfsync over veb/vport
On 23.5.2022. 8:34, Alexandr Nedvedicky wrote: > looks like kind of memory corruption. my bet is use-after-free. > will try to get to it later today. > > does it mean there is no such panic, when we handle IPv4 traffic only? Hi, yes, it seems that i can't trigger panic with ip4 only traffic, at least the same way i can with ip6 traffic
ip6 forwarding with pf and pfsync over veb/vport
Hi all, I can reproduce panic when sending ip6 traffic over vport and destroying pfsync interface. It is reproducible with veb and vport but i couldn't trigger panic when forwarding ip6 over physical interfaces. I've compiled kernel with source fetched half an hour ago just to enable WITNESS. r620-1# ifconfig pfsync0 destroy panicu:v m_ f a u lt ( 0 x f ff f f ff f 8 2 3 ba 6 1 8 , 0 x 17 ,0, 2 ) - >e pkoeronle_cla: c he _ i t em _ m a gi c _ ch e c k : m b u fp l c pu f r ee l i s t m o d if i e d : i t ema d dr 0 xpfagfef f fd 8 0 a 4 1 c3 f 0 0 +2 40 x a f5 5 1 e 6f 8 f 9 0 25 5 f != 0 x a f 55 1 e 6 f8 f 35 f d 5 f fStopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 317552 39553 0 0x14000 0x2002K softnet 504828 12606 0 0x14000 0x2004 softnet *283345 81494 0 0x14000 0x2003 softnet db_enter() at db_enter+0x10 panic(81f39222) at panic+0xbf pool_cache_get(82323228) at pool_cache_get+0x25b pool_get(82323228,2) at pool_get+0x61 m_gethdr(2,1) at m_gethdr+0x3f pfsync_sendout() at pfsync_sendout+0xe9 pfsync_update_state(fd839f1f8950) at pfsync_update_state+0x15b pf_test(18,1,80095048,800022c6ae30) at pf_test+0xd53 veb_pf(80095048,1,fd80a3594900) at veb_pf+0xbf veb_port_input(80095048,fd80a3594900,ecf4bbdaf7f8,80747300) at veb_port_input+0x2ce ether_input(80095048,fd80a3594900) at ether_input+0x100 if_input_process(80095048,800022c6afc8) at if_input_process+0x6f ifiq_process(80099800) at ifiq_process+0x69 taskq_thread(80031100) at taskq_thread+0x11a end trace frame: 0x0, count: 1 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{3}> show panic *cpu3: pool_cache_item_magic_check: mbufpl cpu free list modified: item addr 0xfd80a41c3f00+24 0xaf551e6f8f90255f!=0xaf551e6f8f35fd5f cpu2: uvm_fault(0x823ba618, 0x17, 0, 2) -> e ddb{3}> ddb{3}> show reg rdi0 rsi 0x14 rbp 0x800022c6a960 rbx 0xfd842f835c00 rdx 0xc800 rcx0x282 rax 0x8a r8 0x101010101010101 r9 0 r10 0xedcd3183c339b665 r110xb0f0eb58b1d2563 r12 0x80002241ca60 r130 r140 r15 0x81f39222cmd0646_9_tim_udma+0x314d8 rip 0x8118e200db_enter+0x10 cs 0x8 rflags 0x206 rsp 0x800022c6a960 ss 0x10 db_enter+0x10: popq%rbp ddb{3}> show all locks Process 39553 (softnet) thread 0x8000ed20 (317552) shared rwlock netlock r = 0 (0x822c6550) shared rwlock softnet r = 0 (0x80031370) Process 12606 (softnet) thread 0x8000e000 (504828) shared rwlock netlock r = 0 (0x822c6550) shared rwlock softnet r = 0 (0x80031270) Process 81494 (softnet) thread 0x8000e2a0 (283345) shared rwlock netlock r = 0 (0x822c6550) shared rwlock softnet r = 0 (0x80031170) Process 96881 (softnet) thread 0x8000e540 (159803) shared rwlock softnet r = 0 (0x80031070) Process 26865 (systq) thread 0x8000ea80 (449324) shared rwlock systq r = 0 (0x822dd728) Process 93339 (softclock) thread 0x8000efc0 (160018) shared rwlock timeout r = 0 (0x822b6000) ddb{3}> ddb{3}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 39455 263128 42512 0 3 0x3 netlock ifconfig 42512 258140 1 0 30x10008b sigsusp ksh 34696 282706 1 0 30x100098 kqreadcron 86943 298932 81565 95 3 0x1100092 kqreadsmtpd 34037 448643 81565103 3 0x1100092 kqreadsmtpd 17802 340759 81565 95 3 0x1100092 kqreadsmtpd 54979 438478 81565 95 30x100092 kqreadsmtpd 29724 438684 81565 95 3 0x1100092 kqreadsmtpd 3110 313509 81565 95 3 0x1100092 kqreadsmtpd 81565 137591 1 0 30x100080 kqreadsmtpd 81008 204817 1 0 30x88 kqreadsshd 72442 275002 1 0 30x100080 kqreadntpd 97406 453489 91190 83 30x100092 kqreadntpd 91190 488051 1 83 3 0x1100012 netlock ntpd 31521 42595 4468 73 3 0x1100090 kqreadsyslogd 4468 43476 1 0 30x100082 netio syslogd 66713 499933 0 0 3 0x14200 bored smr
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 13.5.2022. 4:19, David Gwynne wrote: > sorry i'm late to the party. can you try this diff? > > this diff replaces the list of ports with an array/map of ports. > the map takes references to all the ports, so the forwarding paths > just have to hold a reference to the map to be able to use all the > ports. the forwarding path uses smr to get hold of a map, takes a map > ref, and then leaves the smr crit section before iterating over the map > and pushing packets. > > this means we should only take and release a single refcnt when > we're pushing packets out any number of ports. > > if no span ports are configured, then there's no span port map and > we don't try and take a ref, we can just return early. > > we also only take and release a single refcnt when we forward the > actual packet. forwarding to a single port provided by an etherbridge > lookup already takes/releases the single port ref. if it falls > through that for unknown unicast or broadcast/multicast, then it's > a single refcnt for the current map of all ports. Hi, and with this diff i can't trigger panic ...
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 12.5.2022. 20:04, Hrvoje Popovski wrote: > On 12.5.2022. 16:22, Hrvoje Popovski wrote: >> On 12.5.2022. 14:48, Claudio Jeker wrote: >>> I think the diff below may be enough to fix this issue. It drops the SMR >>> critical secition around the enqueue operation but uses a reference on the >>> port insteadt to ensure that the device can't be removed during the >>> enqueue. Once the enqueue is finished we enter the SMR critical section >>> again and drop the reference. >>> >>> To make it clear that the SMR_TAILQ remains intact while a refcount is >>> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is >>> not strictly needed since the next pointer remains valid up until the >>> smr_barrier() call but I find this a bit easier to understand. >>> First make sure nobody else holds a reference then remove the port from >>> the list. >>> >>> I currently have no test setup to verify this but I hope someone else can >>> give this a spin. >> Hi, >> >> for now veb seems stable and i can't panic box although it should, but >> please give me few more hours to torture it properly. > > > I can trigger panic in veb with this diff. > > Thank you .. > > I can't trigger ... :))) sorry ..
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 12.5.2022. 16:22, Hrvoje Popovski wrote: > On 12.5.2022. 14:48, Claudio Jeker wrote: >> I think the diff below may be enough to fix this issue. It drops the SMR >> critical secition around the enqueue operation but uses a reference on the >> port insteadt to ensure that the device can't be removed during the >> enqueue. Once the enqueue is finished we enter the SMR critical section >> again and drop the reference. >> >> To make it clear that the SMR_TAILQ remains intact while a refcount is >> held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is >> not strictly needed since the next pointer remains valid up until the >> smr_barrier() call but I find this a bit easier to understand. >> First make sure nobody else holds a reference then remove the port from >> the list. >> >> I currently have no test setup to verify this but I hope someone else can >> give this a spin. > Hi, > > for now veb seems stable and i can't panic box although it should, but > please give me few more hours to torture it properly. I can trigger panic in veb with this diff. Thank you ..
Re: [External] : Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 12.5.2022. 14:48, Claudio Jeker wrote: > I think the diff below may be enough to fix this issue. It drops the SMR > critical secition around the enqueue operation but uses a reference on the > port insteadt to ensure that the device can't be removed during the > enqueue. Once the enqueue is finished we enter the SMR critical section > again and drop the reference. > > To make it clear that the SMR_TAILQ remains intact while a refcount is > held I moved refcnt_finalize() above SMR_TAILQ_REMOVE_LOCKED(). This is > not strictly needed since the next pointer remains valid up until the > smr_barrier() call but I find this a bit easier to understand. > First make sure nobody else holds a reference then remove the port from > the list. > > I currently have no test setup to verify this but I hope someone else can > give this a spin. Hi, for now veb seems stable and i can't panic box although it should, but please give me few more hours to torture it properly. I'm doing this in loop ifconfig veb1 destroy sh /etc/netstart ifconfig veb0 destroy sh /etc/netstart ifconfig vport1 destroy sh /etc/netstart ifconfig vport0 destroy sh /etc/netstart my config veb1: flags=a843 index 25 llprio 3 groups: veb ix1 flags=3 port 2 ifpriority 0 ifcost 0 vport1 flags=3 port 27 ifpriority 0 ifcost 0 veb0: flags=a843 index 26 llprio 3 groups: veb ix0 flags=3 port 1 ifpriority 0 ifcost 0 vport0 flags=3 port 28 ifpriority 0 ifcost 0 ix2 flags=100 vport1: flags=8943 mtu 1500 lladdr ec:f4:bb:da:f7:fa index 27 priority 0 llprio 3 groups: vport inet 192.168.111.11 netmask 0xff00 broadcast 192.168.111.255 vport0: flags=8943 mtu 1500 lladdr ec:f4:bb:da:f7:f8 index 28 priority 0 llprio 3 groups: vport inet 192.168.100.11 netmask 0xff00 broadcast 192.168.100.255
Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 10.5.2022. 22:55, Alexander Bluhm wrote: > Yes. It is similar. > > I have read the whole mail thread and the final fix got commited. > But it looks incomplete, pf is still sleeping. > > Hrvoje, can you run the tests again that triggered the panics a > year ago? Hi, year ago panics was triggered when veb or tpmr bridged traffic. I've tried that right now and I couldn't trigger that panics. If I put vport and route traffic over veb than I can trigger panic with or without vlans as child-iface for veb. For panic i need to have pf enabled and to run ifconfig veb destroy or ifconfig vlan destroy and sh netstart in loop. this is panic without vlans panic: kernel diagnostic assertion "curcpu()->ci_schedstate.spc_smrdepth == 0" failed: file "/sys/kern/subr_xxx.c", line 163 Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 57981 52408 0 0x14000 0x2003 softnet 18952 62179 0 0x14000 0x2005 softnet db_enter() at db_enter+0x10 panic(81f36a34) at panic+0xbf __assert(81faa7fa,81fd47a7,a3,81fe7c9d) at __assert+0x25 assertwaitok() at assertwaitok+0xcc mi_switch() at mi_switch+0x40 sleep_finish(800022c707a0,1) at sleep_finish+0x10b rw_enter(822b3ad8,2) at rw_enter+0x232 pf_test(2,3,800c6048,800022c70a58) at pf_test+0xcf0 ip_output(fd80a32f1f00,0,800022c70be8,1,0,0,e8e0f1a7c10273fe) at ip_output+0x6b7 ip_forward(fd80a32f1f00,814ee000,fd83a8657078,0) at ip_forward+0x2da ip_input_if(800022c70d28,800022c70d34,4,0,814ee000) at ip_input_if+0x353 ipv4_input(814ee000,fd80a32f1f00) at ipv4_input+0x39 ether_input(814ee000,fd80a32f1f00) at ether_input+0x3ad vport_if_enqueue(814ee000,fd80a32f1f00) at vport_if_enqueue+0x19 end trace frame: 0x800022c70e70, count: 0 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{4}> ddb{4}> show reg rdi0 rsi 0x14 rbp 0x800022c705f0 rbx 0x800022424ff0 rdx 0x8000 rcx0x286 rax 0x7d r8 0x101010101010101 r9 0 r10 0x5b4ef42a9c796b43 r11 0xada7e964a691819f r12 0x800022425a60 r13 0x800022c450a0 r140 r15 0x81f36a34cmd0646_9_tim_udma+0x2d9d2 rip 0x81c01c30db_enter+0x10 cs 0x8 rflags 0x286 rsp 0x800022c705f0 ss 0x10 db_enter+0x10: popq%rbp ddb{4}> ddb{4}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 14129 480066 92457 0 3 0x3 netlock ifconfig 92457 504149 2002 0 30x10008b sigsusp sh 2002 26492 1517 0 30x10008b sigsusp sh 1517 131574 1 0 30x10008b sigsusp ksh 26252 251094 1 0 30x100098 kqreadcron 20251 457205 97875 95 3 0x1100092 kqreadsmtpd 62139 255853 97875103 3 0x1100092 kqreadsmtpd 29505 64154 97875 95 3 0x1100092 kqreadsmtpd 20035 471489 97875 95 30x100092 kqreadsmtpd 91114 73268 97875 95 3 0x1100092 kqreadsmtpd 78396 414422 97875 95 3 0x1100092 kqreadsmtpd 97875 113010 1 0 30x100080 kqreadsmtpd 21916 226987 1 0 30x88 kqreadsshd 90174247 1 0 30x100080 kqreadntpd 72358 391459 38133 83 30x100092 kqreadntpd 38133 355054 1 83 3 0x1100012 netlock ntpd 91824 285625 60194 73 3 0x1100090 kqreadsyslogd 60194 367623 1 0 30x100082 netio syslogd 73270 113983 0 0 3 0x14200 bored smr 51379 478537 0 0 3 0x14200 pgzerozerothread 85386 54454 0 0 3 0x14200 aiodoned aiodoned 10937 491268 0 0 3 0x14200 syncerupdate 85008 360847 0 0 3 0x14200 cleaner cleaner 76642 501363 0 0 3 0x14200 reaperreaper 32934 257878 0 0 3 0x14200 pgdaemon pagedaemon 48583 371156 0 0 3 0x14200 usbtskusbtask 53660 310701 0 0 3 0x14200 usbatsk usbatsk 19211 31258 0 0 3 0x40014200 acpi0 acpi0 11856 305318 0 0 3 0x40014200idle5 9933 290633 0 0 3 0x40014200idle4 41570 94891 0 0 3 0x40014200
Re: 7.1-Current crash with NET_TASKQ 4 and veb interface
On 9.5.2022. 22:04, Alexander Bluhm wrote: > Can some veb or smr hacker explain how this is supposed to work? > > Sleeping in pf is also not ideal as it is in the hot path and slows > down packets. But that is not easy to fix as we have to refactor > the memory allocations before converting pf lock to a mutex. sashan@ > is working on that. Hi, isn't that similar or same panic that was talked about in "parallel forwarding vs. bridges" mail thread on tech@ started by sashan@ https://www.mail-archive.com/tech@openbsd.org/msg64040.html
Re: bnxt panic
On 18.3.2022. 18:59, Alexander Bluhm wrote: > On Fri, Mar 18, 2022 at 11:21:15AM +0100, Hrvoje Popovski wrote: >> On 17.3.2022. 21:31, Alexander Bluhm wrote: >>> I don't have the device and don't know the code. But other drivers >>> don't process rx and tx interrupts when the interface is not running. >>> >>> Maybe this helps. dlg@ and jmatthew@ should know better than me. >>> >>> bluhm >> >> Hi, >> >> i didn't manage to trigger same panic by hand, maybe i didn't try hard >> enough, so i've put "ifconfig bnxt0 down && sleep 2 && ifconfig bnxt0 up >> && sleep 2" in loop and i've got panic below ... > Hi, sorry for hijack your diff with some other panic ... I thought that panics are similar. I've tried everything one more time and now I can answer your questions :) > Does my diff make things better? Yes, I can't panic box with this diff as i can without it. > Does the diff just replace one panic with another? No. > Do you need more effort to trigger crashes now? I've tried all day to reproduce panic with your diff but i can't ... > Does the panic below also happend without my diff? Yes, panic below is easy to trigger with or without your diff .. > bluhm > >> >> with this loop, box panic quite fast even if box doesn't have any >> interface configured and it's totally idle .. >> >> >> bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error. >> bnxt0: failed to set up tx ring >> uvm_fault(0xfd904e3ac880, 0xff0, 0, 1) -> e >> kernel: page fault trap, code=0 >> Stopped at bnxt_queue_down+0x61: movq0(%r13,%rax,1),%rsi >> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND >> *186511 37451 0 0x3 02K ifconfig >> bnxt_queue_down(802ba000,802bb080) at bnxt_queue_down+0x61 >> bnxt_up(802ba000) at bnxt_up+0x3fb >> bnxt_ioctl(802ba048,80206910,800027d1bc90) at bnxt_ioctl+0x15b >> ifioctl(fd8e5b1a98e8,80206910,800027d1bc90,800027d22d28) at >> ifioctl+0x92b >> soo_ioctl(fd8e5b6e3cb8,80206910,800027d1bc90,800027d22d28) >> at soo_ioctl+0x161 >> sys_ioctl(800027d22d28,800027d1bda0,800027d1bdf0) at >> sys_ioctl+0x2c4 >> syscall(800027d1be60) at syscall+0x374 >> Xsyscall() at Xsyscall+0x128 >> end of kernel >> end trace frame: 0x7f7d3130, count: 7 >> https://www.openbsd.org/ddb.html describes the minimum info required in >> bug reports. Insufficient info makes it difficult to find and fix bugs. >> ddb{2}> >> >> >> ddb{2}> show reg >> rdi 0x8226e7b8pci_bus_dma_tag >> rsi 0x802bb080 >> rbp 0x800027d1ba80 >> rbx 0xff >> rdx 0xc800 >> rcx0x202 >> rax0xff0 >> r8 0x3f >> r90x800027d1b9b8 >> r10 0x8f1758fa1ca4c280 >> r11 0x814e96f0_bus_dmamap_destroy >> r120x100 >> r130 >> r140x101 >> r15 0x802ba000 >> rip 0x815385c1bnxt_queue_down+0x61 >> cs 0x8 >> rflags 0x10216__ALIGN_SIZE+0xf216 >> rsp 0x800027d1ba20 >> ss 0x10 >> bnxt_queue_down+0x61: movq0(%r13,%rax,1),%rsi >> >> >> ddb{2}> ps >>PID TID PPIDUID S FLAGS WAIT COMMAND >> *37451 186511 30450 0 7 0x3ifconfig >> 30450 100025 92193 0 30x10008b sigsusp sh >> 92193 391163 1 0 30x10008b sigsusp ksh >> 29430 255996 1 0 30x100098 kqreadcron >> 83509 36913 52718 95 3 0x1100092 kqreadsmtpd >> 85854 120678 52718103 3 0x1100092 kqreadsmtpd >> 10057 425077 52718 95 3 0x1100092 kqreadsmtpd >> 8903 266150 52718 95 30x100092 kqreadsmtpd >> 10952 25497 52718 95 3 0x1100092 kqreadsmtpd >> 10277 273205 52718 95 3 0x1100092 kqreadsmtpd >> 52718 225011 1 0 30x100080 kqreadsmtpd >> 10965 74402 1 0 30x88 kqreadsshd >> 92646 92606 1 0 30x100080 kqread
Re: bnxt panic
On 17.3.2022. 21:31, Alexander Bluhm wrote: > I don't have the device and don't know the code. But other drivers > don't process rx and tx interrupts when the interface is not running. > > Maybe this helps. dlg@ and jmatthew@ should know better than me. > > bluhm Hi, i didn't manage to trigger same panic by hand, maybe i didn't try hard enough, so i've put "ifconfig bnxt0 down && sleep 2 && ifconfig bnxt0 up && sleep 2" in loop and i've got panic below ... with this loop, box panic quite fast even if box doesn't have any interface configured and it's totally idle .. bnxt0: HWRM_RING_ALLOC command returned RESOURCE_ALLOC_ERROR error. bnxt0: failed to set up tx ring uvm_fault(0xfd904e3ac880, 0xff0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at bnxt_queue_down+0x61: movq0(%r13,%rax,1),%rsi TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *186511 37451 0 0x3 02K ifconfig bnxt_queue_down(802ba000,802bb080) at bnxt_queue_down+0x61 bnxt_up(802ba000) at bnxt_up+0x3fb bnxt_ioctl(802ba048,80206910,800027d1bc90) at bnxt_ioctl+0x15b ifioctl(fd8e5b1a98e8,80206910,800027d1bc90,800027d22d28) at ifioctl+0x92b soo_ioctl(fd8e5b6e3cb8,80206910,800027d1bc90,800027d22d28) at soo_ioctl+0x161 sys_ioctl(800027d22d28,800027d1bda0,800027d1bdf0) at sys_ioctl+0x2c4 syscall(800027d1be60) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d3130, count: 7 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{2}> ddb{2}> show reg rdi 0x8226e7b8pci_bus_dma_tag rsi 0x802bb080 rbp 0x800027d1ba80 rbx 0xff rdx 0xc800 rcx0x202 rax0xff0 r8 0x3f r90x800027d1b9b8 r10 0x8f1758fa1ca4c280 r11 0x814e96f0_bus_dmamap_destroy r120x100 r130 r140x101 r15 0x802ba000 rip 0x815385c1bnxt_queue_down+0x61 cs 0x8 rflags 0x10216__ALIGN_SIZE+0xf216 rsp 0x800027d1ba20 ss 0x10 bnxt_queue_down+0x61: movq0(%r13,%rax,1),%rsi ddb{2}> ps PID TID PPIDUID S FLAGS WAIT COMMAND *37451 186511 30450 0 7 0x3ifconfig 30450 100025 92193 0 30x10008b sigsusp sh 92193 391163 1 0 30x10008b sigsusp ksh 29430 255996 1 0 30x100098 kqreadcron 83509 36913 52718 95 3 0x1100092 kqreadsmtpd 85854 120678 52718103 3 0x1100092 kqreadsmtpd 10057 425077 52718 95 3 0x1100092 kqreadsmtpd 8903 266150 52718 95 30x100092 kqreadsmtpd 10952 25497 52718 95 3 0x1100092 kqreadsmtpd 10277 273205 52718 95 3 0x1100092 kqreadsmtpd 52718 225011 1 0 30x100080 kqreadsmtpd 10965 74402 1 0 30x88 kqreadsshd 92646 92606 1 0 30x100080 kqreadntpd 69044 66045 48912 83 30x100092 kqreadntpd 48912 346342 1 83 3 0x1100092 kqreadntpd 94844 373900 21297 73 3 0x1100090 kqreadsyslogd 21297 85879 1 0 30x100082 netio syslogd 35205 481579 0 0 3 0x14200 bored smr 12575 275960 0 0 3 0x14200 pgzerozerothread 24927 447870 0 0 3 0x14200 aiodoned aiodoned 80994 339930 0 0 3 0x14200 syncerupdate 44042 109239 0 0 3 0x14200 cleaner cleaner 96179 166822 0 0 3 0x14200 reaperreaper 15452 36252 0 0 3 0x14200 pgdaemon pagedaemon 25413 304120 0 0 3 0x14200 usbtskusbtask 98205 375271 0 0 3 0x14200 usbatsk usbatsk 69469 523327 0 0 3 0x14200 bored sensors 88463 437912 0 0 3 0x40014200 acpi0 acpi0 22995 343696 0 0 7 0x40014200idle23 34936 338429 0 0 7 0x40014200idle22 55150 49423 0 0 7 0x40014200idle21 17888 189165 0 0 7 0x40014200idle20 96685 487854 0 0 7 0x40014200idle19 13313 506501 0 0 7 0x40014200idle18 88567 261311 0 0 7 0x40014200idle17 56026 316512 0 0 7
Re: bnxt panic
On 16.3.2022. 20:00, Hrvoje Popovski wrote: > Hi all, > > While opensbd box is under pressure and in that moment i run ifconfig > bnxt0 down i get panic... it's not every time and it's that easy to > trigger panic > > I'm sending traffic over ix interfaces and bnxt is for ssh and nothing > else. > > I've compiled kernel with "option BNXT_DEBUG" and put debug in > hostname.bnxt0 but i didn't saw any log regarding bnxt interfaces. > > I will try to trigger panic few more times and will post them here.. this is same panic but with snapshot kernel without debug options uvm_fault(0xfd904e3a9440, 0x0, 0, 1) -> e kernel: page fault trap, code=0 Stopped at bnxt_intr+0x195:movq0(%r14,%r12,1),%rbx TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *465591 26407 0 0x3 00K ifconfig bnxt_intr(802bc7c0) at bnxt_intr+0x195 intr_handler(800027d3d7b0,80269880) at intr_handler+0x6e Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f Xspllower() at Xspllower+0x19 softintr_dispatch(0) at softintr_dispatch+0xdc Xsoftclock() at Xsoftclock+0x1f bnxt_ioctl(802bc048,80206910,800027d3dae0) at bnxt_ioctl+0x165 ifioctl(fd8e5a4f13a8,80206910,800027d3dae0,800027d8da50) at ifioctl+0x92b soo_ioctl(fd904cc0b2e8,80206910,800027d3dae0,800027d8da50) at soo_ioctl+0x161 sys_ioctl(800027d8da50,800027d3dbf0,800027d3dc40) at sys_ioctl+0x2c4 syscall(800027d3dcb0) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7faf30, count: 3 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs.
bnxt panic
Hi all, While opensbd box is under pressure and in that moment i run ifconfig bnxt0 down i get panic... it's not every time and it's that easy to trigger panic I'm sending traffic over ix interfaces and bnxt is for ssh and nothing else. I've compiled kernel with "option BNXT_DEBUG" and put debug in hostname.bnxt0 but i didn't saw any log regarding bnxt interfaces. I will try to trigger panic few more times and will post them here.. smc24# ifconfig bnxt0 down uvm_fault(0xfd8e5b9b0aa8, 0xc80, 0, 1) -> e kernel: page fault trap, code=0 Stopped at bnxt_intr+0x195:movq0(%r14,%r12,1),%rbx TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *112449 43646 0 0x3 00K ifconfig bnxt_intr(802ba7c0) at bnxt_intr+0x195 intr_handler(800027d819d0,80262200) at intr_handler+0x6e Xintr_ioapic_edge28_untramp() at Xintr_ioapic_edge28_untramp+0x18f Xspllower() at Xspllower+0x19 softintr_dispatch(0) at softintr_dispatch+0xdc Xsoftclock() at Xsoftclock+0x1f bnxt_ioctl(802ba048,80206910,800027d81d00) at bnxt_ioctl+0x165 ifioctl(fd8e587008f0,80206910,800027d81d00,800027d23ce8) at ifioctl+0x92b soo_ioctl(fd8e5ba589e8,80206910,800027d81d00,800027d23ce8) at soo_ioctl+0x161 sys_ioctl(800027d23ce8,800027d81e10,800027d81e60) at sys_ioctl+0x2c4 syscall(800027d81ed0) at syscall+0x374 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d2b40, count: 3 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> ddb{0}> show reg rdi 0xfd8008d639b0 rsi 0xfd8008d639b0 rbp 0x800027d81970 rbx0 rdx0 rcx 0x802ba7c0 rax 0xc8 r8 0 r9 0x400 r10 0xa08d02d9a679a6aa r11 0xa5908789619e5597 r120xc80 r130 r140 r15 0x1e rip 0x81891255bnxt_intr+0x195 cs 0x8 rflags 0x10212__ALIGN_SIZE+0xf212 rsp 0x800027d818c0 ss 0x10 bnxt_intr+0x195:movq0(%r14,%r12,1),%rbx ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND *43646 112449 2064 0 7 0x3ifconfig 3020 522978 71359 0 30x100083 ttyin ksh 71359 83904 70315 1000 30x10008b sigsusp ksh 70315 76327 33688 1000 30x98 kqreadsshd 33688 239932 19353 0 30x82 kqreadsshd 2064 31814 1 0 30x10008b sigsusp ksh 15593 505847 1 0 30x100098 kqreadcron 53746 29436 19867 95 3 0x1100092 kqreadsmtpd 70674 201429 19867103 3 0x1100092 kqreadsmtpd 13668 378996 19867 95 3 0x1100092 kqreadsmtpd 7348 364416 19867 95 30x100092 kqreadsmtpd 28469 342601 19867 95 3 0x1100092 kqreadsmtpd 148556717 19867 95 3 0x1100092 kqreadsmtpd 19867 370251 1 0 30x100080 kqreadsmtpd 19353 418536 1 0 30x88 kqreadsshd 78442 277045 1 0 30x100080 kqreadntpd 59273 418917 71217 83 30x100092 kqreadntpd 71217 402451 1 83 3 0x1100092 kqreadntpd 2942 84526 89760 73 3 0x1100090 kqreadsyslogd 89760 90249 1 0 30x100082 netio syslogd 92039 93123 0 0 3 0x14200 bored smr 36298 398102 0 0 3 0x14200 pgzerozerothread 71408 18325 0 0 3 0x14200 aiodoned aiodoned 96775 209072 0 0 3 0x14200 syncerupdate 99374 29019 0 0 3 0x14200 cleaner cleaner 93530 452277 0 0 3 0x14200 reaperreaper 92427 520070 0 0 3 0x14200 pgdaemon pagedaemon 11949 151046 0 0 3 0x14200 usbtskusbtask 14625 109693 0 0 3 0x14200 usbatsk usbatsk 884 415337 0 0 3 0x14200 bored sensors 27132 374599 0 0 3 0x40014200 acpi0 acpi0 44604 508777 0 0 7 0x40014200idle23 50524 223281 0 0 7 0x40014200idle22 382 384761 0 0 7 0x40014200idle21 3240 505687 0 0 7 0x40014200idle20 45727 351159 0 0 7 0x40014200idle19 3827 15512 0 0 7 0x40014200
Re: ix - failed to allocate interrupt slot for PIC msix pin -2143223537
On 21.8.2021. 21:50, Hrvoje Popovski wrote: > Hi all, > > on new AMD 24 core server and i've put 2 dual port ix interfaces and > while booting ix3 interface throws this error > > ix3 at pci21 dev 0 function 1 "Intel 82599" rev 0x01failed to allocate > interrupt slot for PIC msix pin -2143223537 > ixgbe_allocate_msix: pci_intr_establish vec 15 failed > > and i can't see ix3 with ifconfig.. > With option IX_DEBUG i thought that i would see something more, but > there's nothing ... Hi all, now after a few reboots ix problem is gone but i see similar errors on xhci and ahci... smc24# dmesg | grep failed xhci3 at pci23 dev 0 function 3 vendor "AMD", unknown product 0x148c rev 0x00failed to allocate interrupt slot for PIC msi pin -2143091968 ahci2 at pci24 dev 0 function 0 "AMD FCH AHCI" rev 0x51: msi,failed to allocate interrupt slot for PIC msi pin -2143027200 ahci3 at pci25 dev 0 function 0 "AMD FCH AHCI" rev 0x51: msi,failed to allocate interrupt slot for PIC msi pin -2142961664 smc24# vmstat -iz interrupt total rate irq0/clock1859139 4754 irq0/ipi 320887 820 irq96/amdgpio0 00 irq97/acpi0 00 irq98/ppb0 00 irq99/xhci0 00 irq100/ppb1 00 irq101/xhci100 irq102/ppb2 00 irq103/ppb4 00 irq104/ahci000 irq105/ppb5 00 irq114/bnxt010 irq115/bnxt0:0 15243 irq116/bnxt0:1 00 irq117/bnxt0:2 20 irq118/bnxt0:3 170 irq119/bnxt0:4 00 irq120/bnxt0:5 360 irq121/bnxt0:6 230 irq122/bnxt0:77031 irq123/bnxt100 irq124/bnxt1:0 00 irq125/bnxt1:1 00 irq126/bnxt1:2 00 irq127/bnxt1:3 00 irq128/bnxt1:4 00 irq129/bnxt1:5 00 irq130/bnxt1:6 00 irq131/bnxt1:7 00 irq106/ppb8 00 irq132/ix0:0 280 irq133/ix0:100 irq134/ix0:200 irq135/ix0:300 irq136/ix0:400 irq137/ix0:500 irq138/ix0:600 irq139/ix0:700 irq140/ix0:800 irq141/ix0:900 irq142/ix0:10 00 irq143/ix0:11 00 irq144/ix0:12 00 irq145/ix0:13 00 irq146/ix0:14 00 irq147/ix0:15 00 irq148/ix0 10 irq149/ix1:0 280 irq150/ix1:100 irq151/ix1:200 irq152/ix1:300 irq153/ix1:400 irq154/ix1:500 irq155/ix1:600 irq156/ix1:700 irq157/ix1:800 irq158/ix1:900 irq159/ix1:10 00 irq160/ix1:11 00 irq161/ix1:12 00 irq162/ix1:13 00 irq163/ix1:14 00 irq164/ix1:15 00 irq165/ix1 10 irq107/ppb9 00 irq108/ppb1000 irq109/ahci128532 72 irq110/ppb1400 irq166/mcx0 1250 irq167/mcx0:0 280 irq168/mcx0:1 00 irq169/mcx0:2 00 irq170/mcx0:3 00 irq171/mcx0:4 00 irq172/mcx0:5 00 irq173/mcx0:6 00 irq174/mcx0:7 00 irq175/mcx0:8 00 irq176/mcx0:9 00 irq177/mcx0:10 00 irq178/mcx0:11
Re: failed to allocate interrupt slot for PIC msix
On 27.5.2021. 17:04, w...@null0.nl wrote: > ixl13 at pci15 dev 0 function 3 "Intel X710 SFP+" rev 0x02: port 1, FW > 8.3.64775 API 1.13, msix, 8 queues, address 40:a6:b7:51:bf:13 > failed to allocate interrupt slot for PIC msix pin -2136014080 > ixl13: unable to establish interrupt handler > ppb9 at pci14 dev 2 function 0 "Intel Xeon Scalable PCIE" rev 0x07: msi > pci16 at ppb9 bus 176 > ixl14 at pci16 dev 0 function 0 "Intel X710 SFP+" rev 0x02: port 3, FW > 8.3.64775 API 1.13, msix, 8 queues, address 40:a6:b7:51:b7:10 > failed to allocate interrupt slot for PIC msix pin -2135949312 > ixl14: unable to establish interrupt handler > ixl15 at pci16 dev 0 function 1 "Intel X710 SFP+" rev 0x02: port 2, FW > 8.3.64775 API 1.13, msix, 8 queues, address 40:a6:b7:51:b7:11 > failed to allocate interrupt slot for PIC msix pin -2135949056 > ixl15: unable to establish interrupt handler > ixl16 at pci16 dev 0 function 2 "Intel X710 SFP+" rev 0x02: port 0, FW > 8.3.64775 API 1.13, msix, 8 queues, address 40:a6:b7:51:b7:12 > failed to allocate interrupt slot for PIC msix pin -2135948800 > ixl16: unable to establish interrupt handler > ixl17 at pci16 dev 0 function 3 "Intel X710 SFP+" rev 0x02: port 1, FW > 8.3.64775 API 1.13, msix, 8 queues, address 40:a6:b7:51:b7:13 > failed to allocate interrupt slot for PIC msix pin -2135948544 > ixl17: unable to establish interrupt handler Hi, i think that i have same problem but with ix interfaces .. ix3 at pci21 dev 0 function 1 "Intel 82599" rev 0x01failed to allocate interrupt slot for PIC msix pin -2143223537 ixgbe_allocate_msix: pci_intr_establish vec 15 failed
Re: ipsec - panic: malloc: out of space in kmem_map
On 19.7.2021. 0:08, Hrvoje Popovski wrote: > On other hand, isakmpd is stable at 150Kpps even if sending 12Mpps > through tunnel ... and of course i forgot to stop generator and isakmpd is still forwarding traffic .. :)
Re: ipsec - panic: malloc: out of space in kmem_map
On 18.7.2021. 20:39, Hrvoje Popovski wrote: > On 18.7.2021. 20:11, Alexander Bluhm wrote: >> On Sat, Jul 17, 2021 at 06:32:59PM +0200, Hrvoje Popovski wrote: >>> with this diff i'm getting very stable traffic over tunnel and it's >>> little faster. >> >> This is expected. Too much queueing creates oscilating behavior >> and suboptimal throughput. >> >>> Even with your last diff on tech@ >>> https://marc.info/?l=openbsd-tech=162645141414262=2 >>> i'm seeing traffic drops, less frequent, but i'm seeing it... >> >> There is another reason for traffic drops. iked(8) is not clever >> when rekeying. The idea is to have SAs with old key and new key >> simultaneously. After both machines have new SA, the old should >> be removed. But currently we have a window when sender uses new >> SA, but receiver only has old SA and cannot decrypt the packets. >> This is a temproray problem, I see drops for a short time. tobhe@ >> wants to fix this. >> >> I think you use isakmpd(8), I don't know how rekeying works there. > > Yes, I'm using isakmpd, but I can test iked and isakmpd, no problem ... > > >>> Do you want me to test this diff combined with your ipsec diff >>> on tech@ ? >> >> I have commited the replay diff. This fixes permanent packet drop. >> Do you see permanent traffic stalls with current? > > With isakmpd yes, iked haven't tested, but i will now .. > But with your diff from bugs@ everything seems smooth and stable without > drops and panics even with isakmpd :) I've setup isakmpd and iked tunnels and tested both daemons with or without replay diff from tech@ and they behave very similar. I have 2 same boxes with 6 x E5-2643 v2 @ 3.50GHz, 3600.44 MHz, 06-3e-04 and sending traffic (1000 byte UDP) through tunnel between them. ipsec.conf ike active esp from 192.168.232.0/24 to 192.168.123.0/24 \ local 192.168.42.1 peer 192.168.42.111 \ main auth hmac-sha1 enc aes group modp1024 \ quick enc aes-128-gcm group modp1024 \ psk "123" iked.conf ikev2 active esp from 192.168.232.0/24 to 192.168.123.0/24 \ local 192.168.42.1 peer 192.168.42.111 \ ikesa enc aes-128-gcm group modp1024 prf hmac-sha1 \ childsa enc aes-128-gcm group modp1024 \ psk "123" In this config with these boxes if i send up to 150Kpps through iked or ipsec tunnel, tunnel is stable with or without diff. If I send 200Kpps traffic through ipsec tunnel, traffic drops and won't come back, iked tunnel would come back, but after few minutes it stops forwarding traffic completely. If i send 250Kpps or more, with or without replay diff ipsec or iked tunnel stops forwarding traffic after few seconds ... So I think that reply diff only prolong permanent stalls of traffic Interesting is that when traffic completely stops i need to do ifconfig ix1 down && ifconfig ix1 up, ix1 is the interface where my generator is connected, to make traffic flow through tunnel. Stopping the generator and run it again didn't help, only down/up of ix1 interface. When traffic stops mcl2k2 Fail counter increases.. r620-1# vmstat -m | egrep "Fail|mcl" NameSize Requests FailInUse Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle mcl2k 2048 1225 39570 3 0 3 3 0 80 mcl2k2 2112 285844285 198 183 58658 1 58657 58657 0 80 mcl4k 4096 112 13270 1 0 1 1 0 80 mcl8k 8192 406 3020 1 0 1 1 0 80 With queuing diff that you sent here on bugs@, iked behaves the same as before, i just need to send about 600Kpps or more through tunnel and i'm not seeing and mcl Fails when traffic stops. On other hand, isakmpd is stable at 150Kpps even if sending 12Mpps through tunnel ... I'm sorry if this mail is not clear enough, I'm testing, repeating tests, and writing here what I'm seeing
Re: ipsec - panic: malloc: out of space in kmem_map
On 18.7.2021. 20:11, Alexander Bluhm wrote: > On Sat, Jul 17, 2021 at 06:32:59PM +0200, Hrvoje Popovski wrote: >> with this diff i'm getting very stable traffic over tunnel and it's >> little faster. > > This is expected. Too much queueing creates oscilating behavior > and suboptimal throughput. > >> Even with your last diff on tech@ >> https://marc.info/?l=openbsd-tech=162645141414262=2 >> i'm seeing traffic drops, less frequent, but i'm seeing it... > > There is another reason for traffic drops. iked(8) is not clever > when rekeying. The idea is to have SAs with old key and new key > simultaneously. After both machines have new SA, the old should > be removed. But currently we have a window when sender uses new > SA, but receiver only has old SA and cannot decrypt the packets. > This is a temproray problem, I see drops for a short time. tobhe@ > wants to fix this. > > I think you use isakmpd(8), I don't know how rekeying works there. Yes, I'm using isakmpd, but I can test iked and isakmpd, no problem ... >> Do you want me to test this diff combined with your ipsec diff >> on tech@ ? > > I have commited the replay diff. This fixes permanent packet drop. > Do you see permanent traffic stalls with current? With isakmpd yes, iked haven't tested, but i will now .. But with your diff from bugs@ everything seems smooth and stable without drops and panics even with isakmpd :) > Temporary drops are still possible. The rekey problem is known. > The crypto queuing problem is known. You could disable iked lifetime > bytes rekeying and try my no crypto queue diff. > Do you see traffic drops with that? > >> And this diff with parallel forwarding? > > Parallel forwarding still crashes with IPsec. We must commit fixes > step by step until we get it stable. Of course you can try it, but > currently I can reproduce problems myself. Ok, great, now i will concentrate to test iked and isakmpd ..
Re: ipsec - panic: malloc: out of space in kmem_map
On 17.7.2021. 1:02, Alexander Bluhm wrote: > On Fri, Jul 16, 2021 at 10:57:24PM +0200, Alexander Bluhm wrote: >> All I said is more or less theory, I did not test it yet. > I should not send untested diffs. New version one does not crash > immediately. I removed a netlock that is already taken due to not > queuing. This also fixes the tdb->tdb_odrops++ spotted by mvs@. > > Note that avoiding queues is the fastest way do IPsec. > http://bluhm.genua.de/perform/results/2021-07-15T23:54:11Z/perform.html > > This diff is the middle column. > http://bluhm.genua.de/perform/results/2021-07-15T23:54:11Z/gnuplot/ipsec.html > > bluhm Hi, with this diff i'm getting very stable traffic over tunnel and it's little faster. Even with your last diff on tech@ https://marc.info/?l=openbsd-tech=162645141414262=2 i'm seeing traffic drops, less frequent, but i'm seeing it... Do you want me to test this diff combined with your ipsec diff on tech@ ? And this diff with parallel forwarding? tnx for stable ipsec :)
Re: ipsec - panic: malloc: out of space in kmem_map
On 16.7.2021. 20:02, Hrvoje Popovski wrote: > Hi all, > > with source fetched few minutes ago i wanted to test bluhm@ diff > https://marc.info/?l=openbsd-tech=162645141414262=2 > I've found out that with or without bluhm@ diff i'm getting panic. panic > in attachment. > > I'm sending traffic over ipsec tunnel from r620-1 box to x3550m4 box. If > at one point while sending traffic over tunnel i hit "enter" over serial > console on x3550m4, that box panic with this log: with WITNESS .. x3550m4# panic: malloc: out of space in kmem_map Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *289105 63905 00x13 01K ksh db_enter() at db_enter+0x10 panic(81ea2c88) at panic+0xbf malloc(1,7f,1) at malloc+0x7b4 ufs_readdir(800026e84c10) at ufs_readdir+0xf8 VOP_READDIR(fd887f03d460,800026e84c78,fd887f7d7c60,800026e84cbc) at VOP_READDIR+0x50 sys_getdents(800026f2b508,800026e84d30,800026e84d90) at sys_getdents+0x161 syscall(800026e84e00) at syscall+0x3a9 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d7000, count: 7 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{1}> ddb{1}> show locks exclusive rrwlock inode r = 0 (0xfd887ec1dc48) #0 witness_lock+0x333 #1 rw_enter+0x27f #2 rrw_enter+0x56 #3 VOP_LOCK+0x5b #4 vn_lock+0xad #5 sys_getdents+0xfe #6 syscall+0x3a9 #7 Xsyscall+0x128 exclusive kernel_lock _lock r = 0 (0x82306b40) #0 witness_lock+0x333 #1 syscall+0x29e #2 Xsyscall+0x128 ddb{1}> show reg rdi0 rsi 0x14 rbp 0x800026e848d0 rbx 0x1__ALIGN_SIZE+0xf000 rdx 0xfe03 rcx0x282 rax 0x28 r8 0x101010101010101 r9 0 r100 r11 0xe254fbd981f43f47 r12 0x8000219baa00 r13 0x1__ALIGN_SIZE+0xf000 r140 r15 0x81ea2c88apollo_udma100_tim+0xc745 rip 0x8139d200db_enter+0x10 cs 0x8 rflags 0x286 rsp 0x800026e848d0 ss 0x10 db_enter+0x10: popq%rbp ddb{1}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 38747 389002 68287 68 30x90 selectisakmpd 68287 379577 1 0 30x80 netio isakmpd *63905 289105 1 0 70x13ksh 51457 163250 1 0 30x100098 poll cron 77909 402925 35691 95 30x100092 kqreadsmtpd 78088 37152 35691103 30x100092 kqreadsmtpd 62475 202424 35691 95 30x100092 kqreadsmtpd 33104 393361 35691 95 30x100092 kqreadsmtpd 99817 462713 35691 95 30x100092 kqreadsmtpd 49139 379482 35691 95 30x100092 kqreadsmtpd 35691 127563 1 0 30x100080 kqreadsmtpd 60575 345518 1 0 30x88 selectsshd 42089 129781 1 0 30x100080 poll ntpd 13806 436163 95467 83 30x100092 poll ntpd 95467 117285 1 83 30x100092 poll ntpd 68954 146234 99233 74 30x100092 bpf pflogd 99233 26243 1 0 30x80 netio pflogd 46731 44450 83208 73 30x100090 kqreadsyslogd 83208 298754 1 0 30x100082 netio syslogd 24108 326233 1 0 30x100080 kqreadresolvd 86229 481448 0 0 3 0x14200 bored smr 90881 31129 0 0 3 0x14200 pgzerozerothread 3369 196147 0 0 3 0x14200 aiodoned aiodoned 91637 24402 0 0 3 0x14200 syncerupdate 10524 225039 0 0 3 0x14200 cleaner cleaner 98418 474258 0 0 3 0x14200 reaperreaper 4887 253001 0 0 3 0x14200 pgdaemon pagedaemon 72437 269608 0 0 3 0x14200 bored crynlk 21023 435965 0 0 3 0x14200 bored crypto 3751 454766 0 0 3 0x14200 usbtskusbtask 10715 377531 0 0 3 0x14200 usbatsk usbatsk 60934 18965 0 0 3 0x40014200 acpi0 acpi0 43187 352600 0 0 7 0x40014200idle11 21339 40638 0 0 7 0x40014200idle10 97154 350372 0 0 7 0x40014200idle9 10084 1122
ipsec - panic: malloc: out of space in kmem_map
Hi all, with source fetched few minutes ago i wanted to test bluhm@ diff https://marc.info/?l=openbsd-tech=162645141414262=2 I've found out that with or without bluhm@ diff i'm getting panic. panic in attachment. I'm sending traffic over ipsec tunnel from r620-1 box to x3550m4 box. If at one point while sending traffic over tunnel i hit "enter" over serial console on x3550m4, that box panic with this log: x3550m4# panic: malloc: out of space in kmem_map Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *153853 51636 00x13 03 ksh 194352 78666 0 0x14000 0x2004 crynlk 43942 98148 0 0x14000 0x2001 softnet db_enter() at db_enter+0x10 panic(81ea2c7a) at panic+0xbf malloc(1,7f,1) at malloc+0x7b4 ufs_readdir(800026ff6a50) at ufs_readdir+0xf8 VOP_READDIR(fd887f03e380,800026ff6ab8,fd887f7d7600,800026ff6afc) at VOP_READDIR+0x50 sys_getdents(800026ed57a8,800026ff6b70,800026ff6bd0) at sys_getdents+0x161 syscall(800026ff6c40) at syscall+0x3a9 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7cc780, count: 7 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{3}> x3550m4# panic: malloc: out of space in kmem_map Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND *153853 51636 00x13 03 ksh 194352 78666 0 0x14000 0x2004 crynlk 43942 98148 0 0x14000 0x2001 softnet db_enter() at db_enter+0x10 panic(81ea2c7a) at panic+0xbf malloc(1,7f,1) at malloc+0x7b4 ufs_readdir(800026ff6a50) at ufs_readdir+0xf8 VOP_READDIR(fd887f03e380,800026ff6ab8,fd887f7d7600,800026ff6afc) at VOP_READDIR+0x50 sys_getdents(800026ed57a8,800026ff6b70,800026ff6bd0) at sys_getdents+0x161 syscall(800026ff6c40) at syscall+0x3a9 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7cc780, count: 7 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{3}> ddb{3}> show reg rdi0 rsi 0x14 rbp 0x800026ff6710 rbx 0x1__ALIGN_SIZE+0xf000 rdx 0xfe03 rcx0x206 rax 0x28 r8 0x101010101010101 r9 0 r100 r11 0x49e3f089d74cd5c1 r12 0x8000219dca00 r13 0x1__ALIGN_SIZE+0xf000 r140 r15 0x81ea2c7aapollo_udma100_tim+0xd4a7 rip 0x8117d3a0db_enter+0x10 cs 0x8 rflags 0x202 rsp 0x800026ff6710 ss 0x10 db_enter+0x10: popq%rbp ddb{3}> show malloc Type InUse MemUse HighUse Limit Requests Type Lim devbuf 98112 50978K 50979K 78643K 995560 pcb13 8K 8K 78643K130 rtable 155 5K 5K 78643K 2570 ifaddr 177 22K 22K 78643K 1850 counters 322212K 212K 78643K 3220 ioctlops 0 0K 4K 78643K 15670 iov 0 0K 0K 78643K190 mount 9 9K 9K 78643K 90 log 0 0K 13K 78643K 493970 vnodes 1194 75K 75K 78643K 12040 UFS quota 1 32K 32K 78643K 10 UFS mount37 87K 87K 78643K370 shm 2 1K 1K 78643K 20 VM map 2 1K 1K 78643K 20 sem 2 0K 0K 78643K 20 dirhash27 5K 5K 78643K540 ACPI 3991479K 519K 78643K 153350 file desc 3 1K 1K 78643K 40 proc69 87K 100K 78643K 6690 NFS srvsock 1 0K 0K 78643K 10 NFS daemon 1 16K 16K 78643K 10 in_multi28 1K 1K 78643K280 ether_multi 7 0K 0K 78643K 70 ISOFS mount 1 32K 32K 78643K 10 MSDOSFS mount 1 16K 16K 78643K 10 ttys37175K 175K 78643K370
switch splassert
Hi all, i'm not much of a switch user, just tried to play with it. With most recent snapshot i'm getting this splassert r620-1# ifconfig switch0 create && ifconfig switch0 add ix0 && ifconfig switch0 destroy splassert: ifpromisc: want 2 have 0 Starting stack trace... ifpromisc(80082048,0) at ifpromisc+0x5b switch_port_detach(80082048) at switch_port_detach+0x4f switch_clone_destroy(8169b000) at switch_clone_destroy+0x10d if_clone_destroy(80002490b310) at if_clone_destroy+0xd8 ifioctl(fd839bc04dc8,80206979,80002490b310,8000248ed260) at ifioctl+0x1d2 soo_ioctl(fd84209fbac8,80206979,80002490b310,8000248ed260) at soo_ioctl+0x171 sys_ioctl(8000248ed260,80002490b420,80002490b480) at sys_ioctl+0x2d4 syscall(80002490b4f0) at syscall+0x389 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d9c80, count: 248 End of stack trace. dmesg OpenBSD 6.9-beta (GENERIC.MP) #335: Sun Feb 14 21:12:08 MST 2021 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 17115840512 (16322MB) avail mem = 16581775360 (15813MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xcf42c000 (99 entries) bios0: vendor Dell Inc. version "2.9.0" date 12/06/2019 bios0: Dell Inc. PowerEdge R620 acpi0 at bios0: ACPI 3.0 acpi0: sleep states S0 S4 S5 acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WDAT SLIC ERST HEST BERT EINJ TCPA PC__ SRAT SSDT acpi0: wakeup devices PCI0(S5) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 4 (boot processor) cpu0: Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz, 3600.50 MHz, 06-3e-04 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 2, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 100MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE cpu1 at mainbus0: apid 6 (application processor) cpu1: Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz, 3600.00 MHz, 06-3e-04 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 3, package 0 cpu2 at mainbus0: apid 8 (application processor) cpu2: Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz, 3600.00 MHz, 06-3e-04 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 4, package 0 cpu3 at mainbus0: apid 16 (application processor) cpu3: Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz, 3600.00 MHz, 06-3e-04 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 0, core 8, package 0 cpu4 at mainbus0: apid 18 (application processor) cpu4: Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz, 3600.00 MHz, 06-3e-04 cpu4: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,FSGSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu4: 256KB 64b/line 8-way L2 cache cpu4: smt 0, core 9, package 0 cpu5 at mainbus0: apid 20 (application processor) cpu5: Intel(R) Xeon(R) CPU E5-2643 v2 @ 3.50GHz, 3600.00 MHz, 06-3e-04 cpu5:
Re: splassert w/ add/del vlan on bridge
On 11.4.2020. 5:26, Visa Hankala wrote: > On Fri, Apr 10, 2020 at 11:39:59PM +0200, Hrvoje Popovski wrote: >> hostname.tpmr20 >> trunkport vxlan20 >> trunkport vlan20 >> up >> >> >> x3550m4# ifconfig tpmr20 destroy >> >> splassert: vlan_ioctl: want 2 have 0 >> Starting stack trace... >> vlan_ioctl(8129d800,80206910,800021d048a8) at vlan_ioctl+0x65 >> ifpromisc(8129d800,0) at ifpromisc+0xbb >> tpmr_p_dtor(80b0e800,81288100,5ea751037d06af69) at >> tpmr_p_dtor+0xa0 >> tpmr_clone_destroy(80b0e800) at tpmr_clone_destroy+0xba >> ifioctl(fd8784ae41c8,80206979,800021d04ab0,800021c0cd90) at >> ifioctl+0x1c2 >> soo_ioctl(fd877da53e10,80206979,800021d04ab0,800021c0cd90) >> at soo_ioctl+0x171 >> sys_ioctl(800021c0cd90,800021d04bc0,800021d04c20) at >> sys_ioctl+0x2df >> syscall(800021d04c90) at syscall+0x389 >> Xsyscall() at Xsyscall+0x128 >> end of kernel >> end trace frame: 0x7f7c1250, count: 248 >> End of stack trace. > The diff below should fix that. Hi, with bridge and tpmr diffs i can't reproduce splassert tnx ..
Re: splassert w/ add/del vlan on bridge
On 10.4.2020. 21:30, Theo de Raadt wrote: > Why did it take almost a year to find this? > > Or is this bug due to ioctl(2) becoming UNLOCKED on 2020/02/22? Hi guys, i think that this splassert is not related only to bridge.. hostname.bridge1242 add vxlan1242 add vlan1242 up x3550m4# ifconfig bridge1242 destroy splassert: vlan_ioctl: want 2 have 0 Starting stack trace... vlan_ioctl(80b21000,80206910,800021d04818) at vlan_ioctl+0x65 ifpromisc(80b21000,0) at ifpromisc+0xbb bridge_ifremove(80b23e00) at bridge_ifremove+0xa4 bridge_clone_destroy(80b1c000) at bridge_clone_destroy+0xa5 ifioctl(fd8784ae41c8,80206979,800021d04a20,800021bad010) at ifioctl+0x1c2 soo_ioctl(fd8784bb0f18,80206979,800021d04a20,800021bad010) at soo_ioctl+0x171 sys_ioctl(800021bad010,800021d04b30,800021d04b90) at sys_ioctl+0x2df syscall(800021d04c00) at syscall+0x389 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7d7900, count: 248 End of stack trace. hostname.tpmr20 trunkport vxlan20 trunkport vlan20 up x3550m4# ifconfig tpmr20 destroy splassert: vlan_ioctl: want 2 have 0 Starting stack trace... vlan_ioctl(8129d800,80206910,800021d048a8) at vlan_ioctl+0x65 ifpromisc(8129d800,0) at ifpromisc+0xbb tpmr_p_dtor(80b0e800,81288100,5ea751037d06af69) at tpmr_p_dtor+0xa0 tpmr_clone_destroy(80b0e800) at tpmr_clone_destroy+0xba ifioctl(fd8784ae41c8,80206979,800021d04ab0,800021c0cd90) at ifioctl+0x1c2 soo_ioctl(fd877da53e10,80206979,800021d04ab0,800021c0cd90) at soo_ioctl+0x171 sys_ioctl(800021c0cd90,800021d04bc0,800021d04c20) at sys_ioctl+0x2df syscall(800021d04c90) at syscall+0x389 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7c1250, count: 248 End of stack trace.
Re: Kernel crash in OpenBSD 6.5
On 30.7.2019. 13:34, illya.me...@wiesan.de wrote: > Am 30.07.19 um 13:17 schrieb Hrvoje Popovski: >> 2) - download install.iso, burn it on cd or usb disk >> 3) - boot from cd or usb > > That's not so easy, I haven't a monitor at this machine. > I try the „manual update process“ via ssh and if I crash the machine, > it's crashed and I have to walk through ;-) > > [10 minutes later] > > The machine is up in -current (at least, I hope) and "ifconfig -A" > doesn't crash it. > > So, how can I update our other "normal" 6.5 machines? Is it possible to > provide a patch for the problem on the errata-page? > > Thank you very much and kind regards > Illya sorry, i forgot to put bugs@ in mail .. try to update both boxes to latest snapshot at least because in snapshot you have excellent tool called sysupgrade ... you will love it :) with this tool you can upgrade os to latest snapshot without any problem over ssh :)
Re: ifconfig bridge crashes host
On 23.7.2019. 17:03, obs...@high5.nl wrote: >> Synopsis:ifconfig bridge crashes host >> Category: >> Environment: > System : OpenBSD 6.5 > Details : OpenBSD 6.5 (GENERIC.MP) #1: Mon May 27 18:27:59 CEST 2019 > > r...@syspatch-65-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 >> Description: > After running the command "ifconfig bridge" twice on the host, the host > became unresponsive. I was able to capture the trace from the console. >> How-To-Repeat: > The host was running for some time so I am uncertain if it's related to > time, > but I have seen this happening a couple of times now, and it seems > running the > "ifconfig bridge" command multiple times triggers this. Hi, can you update your box with latest snapshot ? There were some problems with "ifconfig bridge" command few months ago..
Re: Fujitsu RX 2530 M4 - Segmentation fault - ttyflags.core
On 4.6.2019. 17:47, Otto Moerbeek wrote: > Any idea how you ended up with this malformed file? No, i simply don't know how that happened ...
Re: Fujitsu RX 2530 M4 - Segmentation fault - ttyflags.core
On 4.6.2019. 16:54, Otto Moerbeek wrote: > On Tue, Jun 04, 2019 at 04:50:28PM +0200, Hrvoje Popovski wrote: > >> On 4.6.2019. 16:38, Otto Moerbeek wrote: >>> On Tue, Jun 04, 2019 at 03:41:02PM +0200, Hrvoje Popovski wrote: >>> >>>> Hi all, >>>> >>>> after upgrading Fujitsu RX 2530 M4 from 6.4 to 6.5 in dmesg i saw >>>> "Segmentation fault (core dumped)" instead of "setting tty flags". >>>> Installation and upgrade are very standard, but they are done over >>>> Fujitsu AVR, their remote console like Dell iDRAC. >>>> Machine is working normally, i just wanted to report this. >>>> >>>> core dump: >>>> http://kosjenka.srce.hr/~hrvoje/openbsd/ttyflags.core >>>> >>>> sendbug is in attachment >>>> >>>> dmesg -s >>>> Automatic boot in progress: starting file system checks. >>>> /dev/sd0a (208b7bf77955eee9.a): file system is clean; not checking >>>> /dev/sd0k (208b7bf77955eee9.k): file system is clean; not checking >>>> /dev/sd0d (208b7bf77955eee9.d): file system is clean; not checking >>>> /dev/sd0f (208b7bf77955eee9.f): file system is clean; not checking >>>> /dev/sd0g (208b7bf77955eee9.g): file system is clean; not checking >>>> /dev/sd0h (208b7bf77955eee9.h): file system is clean; not checking >>>> /dev/sd0j (208b7bf77955eee9.j): file system is clean; not checking >>>> /dev/sd0i (208b7bf77955eee9.i): file system is clean; not checking >>>> /dev/sd0e (208b7bf77955eee9.e): file system is clean; not checking >>>> Segmentation fault (core dumped) >>>> starting network >>>> reordering libraries: done. >>>> starting early daemons: syslogd ntpd. >>>> starting RPC daemons:. >>>> savecore: no core dump >>>> checking quotas: done. >>>> clearing /tmp >>>> kern.securelevel: 0 -> 1 >>>> creating runtime link editor directory cache. >>>> preserving editor files. >>>> starting network daemons: sshd snmpd bgpd smtpd. >>>> starting package daemons: zabbix_agentd. >>>> starting local daemons: cron. >>>> Tue Jun 4 06:44:04 CEST 2019 >>> >>> Does this also happen when you run ttyflags -a by hand? >>> >> >> yes it happens, >> >> rs2# ttyflags -a >> Segmentation fault (core dumped) >> >> http://kosjenka.srce.hr/~hrvoje/openbsd/ttyflags2.core >> >> >>> Do you have a malformed entry in /etc/ttys ? >> >> i don't think that i have. i haven't touch /etc/ttys >> >> >>> Please share your /etc/ttys file. >> >> http://kosjenka.srce.hr/~hrvoje/openbsd/ttys >> > > there's a stray t at the end. ohh .. for god sake .. i'm sorry for taking your time .. thank you ..
Re: Fujitsu RX 2530 M4 - Segmentation fault - ttyflags.core
On 4.6.2019. 16:38, Otto Moerbeek wrote: > On Tue, Jun 04, 2019 at 03:41:02PM +0200, Hrvoje Popovski wrote: > >> Hi all, >> >> after upgrading Fujitsu RX 2530 M4 from 6.4 to 6.5 in dmesg i saw >> "Segmentation fault (core dumped)" instead of "setting tty flags". >> Installation and upgrade are very standard, but they are done over >> Fujitsu AVR, their remote console like Dell iDRAC. >> Machine is working normally, i just wanted to report this. >> >> core dump: >> http://kosjenka.srce.hr/~hrvoje/openbsd/ttyflags.core >> >> sendbug is in attachment >> >> dmesg -s >> Automatic boot in progress: starting file system checks. >> /dev/sd0a (208b7bf77955eee9.a): file system is clean; not checking >> /dev/sd0k (208b7bf77955eee9.k): file system is clean; not checking >> /dev/sd0d (208b7bf77955eee9.d): file system is clean; not checking >> /dev/sd0f (208b7bf77955eee9.f): file system is clean; not checking >> /dev/sd0g (208b7bf77955eee9.g): file system is clean; not checking >> /dev/sd0h (208b7bf77955eee9.h): file system is clean; not checking >> /dev/sd0j (208b7bf77955eee9.j): file system is clean; not checking >> /dev/sd0i (208b7bf77955eee9.i): file system is clean; not checking >> /dev/sd0e (208b7bf77955eee9.e): file system is clean; not checking >> Segmentation fault (core dumped) >> starting network >> reordering libraries: done. >> starting early daemons: syslogd ntpd. >> starting RPC daemons:. >> savecore: no core dump >> checking quotas: done. >> clearing /tmp >> kern.securelevel: 0 -> 1 >> creating runtime link editor directory cache. >> preserving editor files. >> starting network daemons: sshd snmpd bgpd smtpd. >> starting package daemons: zabbix_agentd. >> starting local daemons: cron. >> Tue Jun 4 06:44:04 CEST 2019 > > Does this also happen when you run ttyflags -a by hand? > yes it happens, rs2# ttyflags -a Segmentation fault (core dumped) http://kosjenka.srce.hr/~hrvoje/openbsd/ttyflags2.core > Do you have a malformed entry in /etc/ttys ? i don't think that i have. i haven't touch /etc/ttys > Please share your /etc/ttys file. http://kosjenka.srce.hr/~hrvoje/openbsd/ttys
Re: witness report
On 3.6.2019. 18:32, Philip Guenther wrote: > On Mon, 3 Jun 2019, Hrvoje Popovski wrote: >> i'm having samba server, transmission client and gnome desktop on one >> box. from time to time i'm getting witness log below. source is clean >> and fetched few hours ago and compiled with WITNESS. userland and >> packages are up to date .. >> i put kern.witness.watch=3 in sysctl.conf so now i'm in ddb and will >> leave it like this if something is needed > > From pguent...@proofpoint.com Sat Jun 1 13:25:04 2019 > Date: Sat, 1 Jun 2019 13:25:00 -0700 > From: Philip Guenther > To: Antoine Jacoutot > Cc: hack...@openbsd.org > Subject: Re: witness and unveil > > On Sat, 1 Jun 2019, Antoine Jacoutot wrote: >> Running a WITNESS kernel, mpi@ told me to send this here. >> >> kern.version=OpenBSD 6.5-current (GENERIC.MP) #0: Sat Jun 1 18:29:16 CEST >> 2019 >> >> witness: acquiring duplicate lock of same type: ">uv_lock" >> 1st unveil >> 2nd unveil > > Give this diff a try. Hi, with this diff i can't reproduce witness log. if something comes up i will report it back ..
witness report
Hi all, i'm having samba server, transmission client and gnome desktop on one box. from time to time i'm getting witness log below. source is clean and fetched few hours ago and compiled with WITNESS. userland and packages are up to date .. i put kern.witness.watch=3 in sysctl.conf so now i'm in ddb and will leave it like this if something is needed witness log from console: witness: acquiring duplicate lock of same type: ">uv_lock" 1st unveil 2nd unveil Starting stack trace... witness_checkorder(80af9078,9,0) at witness_checkorder+0x826 rw_enter_write(80af9068) at rw_enter_write+0x43 unveil_copy() at unveil_copy+0x183 process_new(8000331b8278,800032f94d50,1) at process_new+0xde fork1() at fork1+0x2d7 syscall(80003300b1f0) at syscall+0x389 Xsyscall(6,2,c781ea04749,2,7f7cd478,c7a81850500) at Xsyscall+0x128 end of kernel end trace frame: 0x7f7cd040, count: 250 End of stack trace. ddb output with kern.witness.watch=3 witness: acquiring duplicate lock of same type: ">uv_lock" 1st unveil 2nd unveil Starting stack trace... witness_checkorder(80af9078,9,0) at witness_checkorder+0x826 rw_enter_write(80af9068) at rw_enter_write+0x43 unveil_copy() at unveil_copy+0x183 process_new(8000331b8278,800032f94d50,1) at process_new+0xde fork1() at fork1+0x2d7 syscall(80003300b1f0) at syscall+0x389 Xsyscall(6,2,c781ea04749,2,7f7cd478,c7a81850500) at Xsyscall+0x128 end of kernel end trace frame: 0x7f7cd040, count: 250 End of stack trace. Stopped at db_enter+0x10: popq%rbp ddb{1}> trace db_enter() at db_enter+0x10 witness_checkorder(80af9078,9,0) at witness_checkorder+0x82b rw_enter_write(80af9068) at rw_enter_write+0x43 unveil_copy() at unveil_copy+0x183 process_new(8000331b8278,800032f94d50,1) at process_new+0xde fork1() at fork1+0x2d7 syscall(80003300b1f0) at syscall+0x389 Xsyscall(6,2,c781ea04749,2,7f7cd478,c7a81850500) at Xsyscall+0x128 end of kernel end trace frame: 0x7f7cd040, count: -8 ddb{1}> ddb{1}> mach ddbcpu 0 Stopped at x86_ipi_db+0x12:leave ddb{0}> trace x86_ipi_db(81d0fff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi(6,81d0fff0,800c9b00,0,0,81e7da68) at Xresume_lapic_ipi+0x23 __mp_lock(81e7da68) at __mp_lock+0xae intr_handler(800022287240,800c9b00) at intr_handler+0x44 Xintr_ioapic_edge23_untramp(0,81d0fff0,fd81eef2edc0,0,0,81e 7da68) at Xintr_ioapic_edge23_untramp+0x19f __mp_lock(81e7da68) at __mp_lock+0xa9 sowakeup(fd81eef2edc0,fd81eef2ee48) at sowakeup+0x8f sorwakeup(fd81eef2edc0) at sorwakeup+0x78 udp_sbappend(fd81eeafa7b0,fd800e7b4700,fd80131a7490,0,14,fd8013 1a74a4) at udp_sbappend+0x1c8 udp_input(800022287588,800022287594,11,2) at udp_input+0xd21 ip_deliver(800022287588,800022287594,11,2) at ip_deliver+0x223 ipintr() at ipintr+0x5f if_netisr(0) at if_netisr+0x4e taskq_thread(80026080) at taskq_thread+0x67 end trace frame: 0x0, count: -15 ddb{0}> ddb{0}> mach ddbcpu 2 Stopped at x86_ipi_db+0x12:leave ddb{2}> trace x86_ipi_db(80002201aff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi(0,0,1388,0,800cca80,80002201b6f8) at Xresume_lapic_ipi+0x23 acpicpu_idle() at acpicpu_idle+0x271 sched_idle(80002201aff0) at sched_idle+0x225 end trace frame: 0x0, count: -5 ddb{2}> ddb{3}> trace x86_ipi_db(800022023ff0) at x86_ipi_db+0x12 x86_ipi_handler() at x86_ipi_handler+0x80 Xresume_lapic_ipi(c,800022023ff0,800022023ff0,0,3,81e7da68) at Xresume_lapic_ipi+0x23 __mp_lock(81e7da68) at __mp_lock+0xa9 __mp_acquire_count(81e7da68,1) at __mp_acquire_count+0x38 mi_switch() at mi_switch+0x243 sleep_finish(80003345ce48,1) at sleep_finish+0x84 tsleep(fd80b7e19f28,118,81aefeef,485) at tsleep+0xc7 kqueue_scan(fd80b7e19f28,40,66acc36f000,80003345d208,800032f3e2a8,f fff80003345d248) at kqueue_scan+0x4ec sys_kevent(800032f3e2a8,80003345d2b0,80003345d310) at sys_kevent+0x28f syscall(80003345d380) at syscall+0x389 Xsyscall(0,48,7f7bd0b0,48,0,66acc36f000) at Xsyscall+0x128 end of kernel end trace frame: 0x7f7bd070, count: -12 ddb{3}> ddb{1}> show locks shared rwlock unveil r = 0 (0x80966078) exclusive kernel_lock _lock r = 0 (0x81e7dc70) ddb{1}> show all locks Process 8789 (transmission-dae) thread 0x800032fb73b0 (223699) exclusive rrwlock inode r = 0 (0xfd81ec938818) Process 33527 (ntpd) thread 0x800032f6c500 (150820) shared rwlock unveil r = 0 (0x80966078) exclusive kernel_lock _lock r = 0 (0x81e7dc70) Process 32311 (softnet) thread 0x800022260750 (481883) exclusive rwlock netlock r = 0 (0x81d2d728) shared rwlock softnet r = 0 (0x800260d8) ddb{1}> ddb{1}> show uvm Current UVM status: pagesize=4096
Re: bridge - kernel: protection fault trap
On 3.5.2019. 13:32, Alexander Bluhm wrote: > On Fri, May 03, 2019 at 12:15:44PM +0200, Alexander Bluhm wrote: >> 0 3082 39335 10 10 0 304 232 ifidxrm D+p00:00.00 >> /sbin/ifconfig bridge12 destroy > Looks like a missing if_put(). Hi, with this diff i can't reproduce panic with ifconfig bridge0 destroy after removing stp from interfaces in bridge .. Thank you guys ...
Re: bridge - kernel: protection fault trap
On 30.4.2019. 23:40, Martin Pieuchot wrote: > On 30/04/19(Tue) 14:45, Hrvoje Popovski wrote: >> Hi all, >> >> if i have bridge with rstp on interfaces and rstp on switch and i want >> to disable rstp on openbsd interfaces i'm getting fault trap. I can >> reproduce it on 6.4 and on -current. >> i can't reproduce it if i don't have rstp on switch. > > Seems that `bs_root_port' isn't reset. Does the diff below help? > Hi, yes, it helps. i can't reproduce trap with ifconfig bridge0 after removing stp from interfaces in bridge. But now if i destroy bridge0 after removing stp from interfaces box freeze and if in second terminal i execute reboot i'm getting same or similar trap. i didn't try ifconfig bridge0 destroy without this diff .. bridge0: flags=41 index 18 llprio 3 groups: bridge priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp ix1 flags=eb port 6 ifpriority 128 ifcost 2000 learning role root ix0 flags=eb port 5 ifpriority 128 ifcost 2000 discarding role alternate x3550m4# ifconfig bridge0 -stp ix0 x3550m4# ifconfig bridge0 -stp ix1 x3550m4# ifconfig bridge0 bridge0: flags=41 index 18 llprio 3 groups: bridge priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp designated: id a0:36:9f:2e:96:a1 priority 32768 ix1 flags=e3 port 6 ifpriority 0 ifcost 0 ix0 flags=e3 port 5 ifpriority 0 ifcost 0 Addresses (max cache: 100, timeout: 240): 00:01:e8:8a:ea:53 ix1 1 flags=0<> x3550m4# ifconfig bridge0 destroy after this box freeze and when trying to reboot in other terminal i'm getting this: uvm_fault(0xfd87845eae78, 0x50, 0, 1) -> e kernel: page fault trap, code=0 Stopped at bridge_ioctl+0x25d: movq0x10(%rax),%rax ddb{5}> trace bridge_ioctl(80aa1000,c0406958,800025c803c0) at bridge_ioctl+0x25d ifioctl(fd8784f154a8,c0406958,800025c803c0,8000fffef790) at ifioctl+0x2e1 sys_ioctl(8000fffef790,800025c804e0,800025c80550) at sys_ioctl+0x3c4 syscall(800025c805c0) at syscall+0x2d5 Xsyscall(6,36,7f7bdd60,36,7f7bd7e0,1120dda0c53f) at Xsyscall+0x128 end of kernel end trace frame: 0x7f7bd840, count: -5 ddb{5}> ddb{5}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 8881 355476 58607 0 30x100080 piperdsh *61948 478931 58607 0 7 0x2ifconfig 58607 500143 60114 0 30x10008a pause sh 60114 118807 0 30x83 wait reboot 475759 36533 0 30x10008b pause ksh 36533 245986 54582 1000 30x10008b pause ksh 54582 88066 51228 1000 30x90 selectsshd 51228 249575 10714 0 30x82 poll sshd 89349 12679 78679 0 3 0x3 ifidxrm ifconfig 78679 494362 1 0 30x10008b pause ksh 23063 361819 1 0 30x100083 ttyin getty 5688 521523 1 0 30x100083 ttyin getty 10811 485927 1 0 30x100083 ttyin getty 53603 187259 1 0 30x100083 ttyin getty 76136 329246 1 0 30x100083 ttyin getty 37428 18304 1 0 30x100098 poll cron 35480 87192 93615 95 30x100092 kqreadsmtpd 361385975 93615103 30x100092 kqreadsmtpd 30067 12755 93615 95 30x100092 kqreadsmtpd 93539 274871 93615 95 30x100092 kqreadsmtpd 22439 508287 93615 95 30x100092 kqreadsmtpd 72080 200916 93615 95 30x100092 kqreadsmtpd 93615 356738 1 0 30x100080 kqreadsmtpd 10714 370355 1 0 30x80 selectsshd 28735 407225 44481 83 30x100092 poll ntpd 44481 159120 76739 83 30x100092 poll ntpd 76739 329789 1 0 30x100080 poll ntpd 65485 248241 32912 73 70x100090syslogd 32912 421250 1 0 30x100082 netio syslogd 96126 242404 0 0 3 0x14200 pgzerozerothread 73505 492214 0 0 3 0x14200 aiodoned aiodoned 89628 391838 0 0 3 0x14200 syncerupdate 33653 327764 0 0 3 0x14200 cleaner cleaner 7953 391928 0 0 3 0x14200 reaperreaper 166285698 0 0 3 0x14200 pgdaemon pagedaemon 29741 351482 0 0 3 0x14200 bored crynlk 44734 415647 0 0 3 0x14200 bored crypto 36512 227354 0 0 3 0x14200 usbtskusbtask 38180 58616 0 0 3 0x14200 usbats
Re: bridge - kernel: protection fault trap
On 30.4.2019. 14:45, Hrvoje Popovski wrote: > Hi all, > > if i have bridge with rstp on interfaces and rstp on switch and i want > to disable rstp on openbsd interfaces i'm getting fault trap. i suck so much at describing the problem problem is executing "ifconfig bridge0" after removing stp from interfaces in bridge ...
Re: Strange (mis)behaviour of pf ruleset in combination with dhcpd
On 10.4.2019. 11:19, illya.me...@wiesan.de wrote: > Am 10.04.19 um 10:58 schrieb Otto Moerbeek: >> On Wed, Apr 10, 2019 at 10:08:51AM +0200, illya.me...@wiesan.de wrote: >> >>> >>> Am 10.04.19 um 07:34 schrieb Bruno Flückiger: On 09.04., illya.me...@wiesan.de wrote: > Dear all, > > I discovered a strange problem with OpenBSD 6.4 AMD64 > (stable(?)-release > with all 16 patches). > > When running dhcpd, some pf rules are seem to not working. > I'm pretty sure, this behaviour is different than in 6.3. > > Setup: > ++ ++ +--+ > | Client |--| Switch |--em0-| OpenBSD (with dhcpd) | > ++ ++ +--+ > > I try to get an ip address for „Client“ via dhcp from „OpenBSD“, > but in > pf.conf I block traffic on port 67+68 (see below). > > When dhcpd is NOT running, I got from „tcpdump -nettti pflog0“ as > expected: > Schnipp 8< > Apr 09 16:29:05.165687 rule 3/(match) block in on em0: 0.0.0.0.68 > > 255.255.255.255.67: xid:0x3f51206f secs:5 [|bootp] [tos 0x10] > Schnapp 8< > > When dhcpd („dhcpd em0“) is running, I got an entry in > /var/log/daemon.log: > Schnipp 8< > Apr 9 16:30:40 feuerwand dhcpd[50668]: DHCPDISCOVER from > 00:96:69:96:69:96 > via em0 > Apr 9 16:30:41 feuerwand dhcpd[50668]: DHCPOFFER on 10.69.250.1 to > 00:96:69:96:69:96 via em0 > Apr 9 16:30:41 feuerwand dhcpd[50668]: DHCPREQUEST for 10.69.250.1 > from > 00:96:69:96:69:96 via em0 > Apr 9 16:30:41 feuerwand dhcpd[50668]: DHCPACK on 10.69.250.1 to > 00:96:69:96:69:96 via em0 > Schniap 8< > > .. and this entry via tcpdump: > Schnipp 8< > Apr 09 16:30:40.450863 rule 5/(match) pass out on em0: 10.69.228.156 > > 10.69.250.1: icmp: echo request > Schnapp 8< > > .. and „Client“ got an ip address! > > If you need futher information don't hesistate to contact me. > > Please tell me also, if I'm to stupid to understand what happenend ;-) > > If you want to know, why I'm running dhcpd and want to block the > traffic: We > use OpenBSD as bridge and dhcpd should only offer ip-addresses to > one side. > But this strange behaviour is also present without the > bridge-configuration. > > Thank you for your help and support > Illya Meyer > Hi Illya DHCP operates on layer 2 using bpf(4) to receive and send packets. Packet filtering takes place on layers 3 and 4. This means that dhcpd(8) has done its work before the packets get to pf(4). If you want to make sure that dhcpd(8) hands out leases only on interface em0 you can tell it to operate only on this interface: # rcctl set dhcpd flags em0 Cheers, Bruno >>> >>> Hi Bruno, >>> >>> thank you for the information. >>> >>> It's strange, that a packet first reachs a daemon and then the packet >>> filter >>> (thats job it is to protect the machine from unwanted packets!) >>> >>> Maybe it's a good idea to build a bpf-Filter for layer 2 :-) >>> >>> Thank you and kind regards, >>> Illya >>> >> >> What do you think dhcpd uses? >> >> -Otto >> > > Hm, sorry. What do you mean exactly? > > In my opinion, it should be possible for a packet filter to block ALL > packets, that arrives from a network, before a daemon (in this case > dhcpd) does its work. > > But as Bruno sayd, dhcpd listens on layer 2 and answers first, before pf > gets the packet on layer 3. So was my understanding. Please see my tests > above, pf doesn't block the dhcp requests when dhcpd runs. > > In my scenario, I have a firewall, which works as bridge (so more a > firebridge ;-)) with a dhcpd for „Good net“ and blocking the most things > from „Bad net“ (especially dhcp requests). > > +-+ ++ +--+ > | Bad net |---em0-| OpenBSD-Bridge |-em1---| Good net | > +-+ ++ +--+ > > Only em0 has had an ip address and so dhcpd had to listen on em0. But > some PCs from „Bad net“ got ip addresses from the BSD-Box. > My solution was now to give the BSD-Box a second ip address on em1 and > let dhcpd listens on em1 only. This works with the pf-rules (see above). > > When I interpret this article in the right way > (https://www.linuxtopia.org/Linux_Firewall_iptables/c479.html) iptables > on Linux works on layer 2, so it should be possible to block dhcp > requests. Other articles said the same (e.g. > https://serverfault.com/questions/873839/block-dhcp-traffic-for-one-device-mac-address) > > But it seems, this is not possible with pf, which works on layer 3. > > Kind regards, > Illya > > maybe you could use tcpdump -B fildrop feature, but you need -current to do this ..
Re: splassert: bstp_notify_rtage
On 29.3.2019. 15:32, Martin Pieuchot wrote: > Hello, > > On 24/03/19(Sun) 01:00, Hrvoje Popovski wrote: >> Hi all, >> >> while playing around with stp and pair interfaces and using exactly the >> same example as in man (4) pair >> >> ifconfig pair0 up >> ifconfig pair1 rdomain 1 patch pair0 up >> ifconfig pair2 up >> ifconfig pair3 rdomain 1 patch pair2 up >> ifconfig bridge0 add pair0 add pair2 stp pair0 stp pair2 up >> ifconfig bridge1 add pair1 add pair3 stp pair1 stp pair3 up >> >> and while destroying/creating stp root pair interfaces with >> kern.pool_debug=1 and kern.splassert=2 i'm getting this traces >> >> splassert: bstp_notify_rtage: want 2 have 0 >> Starting stack trace... >> bstp_update_tc(804e0c00) at bstp_update_tc+0x338 >> bstp_tick(80159700) at bstp_tick+0x357 >> softclock(0) at softclock+0x123 >> softintr_dispatch(0) at softintr_dispatch+0x11e >> Xsoftclock(0,0,1388,0,80021800,81d0b6d0) at Xsoftclock+0x1f >> acpicpu_idle() at acpicpu_idle+0x281 >> sched_idle(81d0aff0) at sched_idle+0x235 >> end trace frame: 0x0, count: 250 >> End of stack trace. > > It's an incorrect assert. What's currently protecting all the bridge > data structures is the KERNEL_LOCK(). Does the diff below help? Yes it helps. With this diff i can't reproduce traces .. Tnx ..
splassert: bstp_notify_rtage
Hi all, while playing around with stp and pair interfaces and using exactly the same example as in man (4) pair ifconfig pair0 up ifconfig pair1 rdomain 1 patch pair0 up ifconfig pair2 up ifconfig pair3 rdomain 1 patch pair2 up ifconfig bridge0 add pair0 add pair2 stp pair0 stp pair2 up ifconfig bridge1 add pair1 add pair3 stp pair1 stp pair3 up and while destroying/creating stp root pair interfaces with kern.pool_debug=1 and kern.splassert=2 i'm getting this traces splassert: bstp_notify_rtage: want 2 have 0 Starting stack trace... bstp_update_tc(804e0c00) at bstp_update_tc+0x338 bstp_tick(80159700) at bstp_tick+0x357 softclock(0) at softclock+0x123 softintr_dispatch(0) at softintr_dispatch+0x11e Xsoftclock(0,0,1388,0,80021800,81d0b6d0) at Xsoftclock+0x1f acpicpu_idle() at acpicpu_idle+0x281 sched_idle(81d0aff0) at sched_idle+0x235 end trace frame: 0x0, count: 250 End of stack trace. splassert: bstp_notify_rtage: want 2 have 256 Starting stack trace... bstp_set_port_tc(804e0600,5) at bstp_set_port_tc+0x1a6 bstp_update_tc(804e0600) at bstp_update_tc+0xfd bstp_tick(80159000) at bstp_tick+0x357 softclock(0) at softclock+0x123 softintr_dispatch(0) at softintr_dispatch+0x11e Xsoftclock(0,0,1388,0,80021800,81d0b6d0) at Xsoftclock+0x1f acpicpu_idle() at acpicpu_idle+0x281 sched_idle(81d0aff0) at sched_idle+0x235 end trace frame: 0x0, count: 249 End of stack trace. i don't know is this serious or not, i'm just sending report here for the record. from misc@ https://www.mail-archive.com/misc@openbsd.org/msg165596.html log with "option BRIDGESTP_DEBUG" : bstp: state changed to DISCARDING on pair0 bstp: pair0 -> TC_INACTIVE bstp: state changed to DISCARDING on pair2 bstp: pair2 -> TC_INACTIVE bstp: state changed to DISCARDING on pair1 bstp: pair1 -> TC_INACTIVE bstp: state changed to DISCARDING on pair3 bstp: pair3 -> TC_INACTIVE bstp: pair0 role -> DESIGNATED bstp: pair0 -> DESIGNATED_SYNCED bstp: pair0 -> DESIGNATED_PROPOSE bstp: pair1 role -> DESIGNATED bstp: pair1 -> DESIGNATED_SYNCED bstp: pair1 -> DESIGNATED_PROPOSE bstp: pair2 role -> DESIGNATED bstp: pair2 -> DESIGNATED_SYNCED bstp: pair2 -> DESIGNATED_PROPOSE bstp: pair3 role -> DESIGNATED bstp: pair3 -> DESIGNATED_SYNCED bstp: pair3 -> DESIGNATED_PROPOSE bstp: pair2 role -> ALT/BACK/DISABLED bstp: pair3 role -> ALT/BACK/DISABLED bstp: pair1 role -> ROOT bstp: pair1 -> ROOT_REROOT bstp: pair1 -> ROOT_AGREED bstp: state changed to LEARNING on pair1 bstp: pair1 -> TC_LEARNING bstp: pair1 -> ROOT_AGREED bstp: state changed to FORWARDING on pair1 bstp: pair1 -> ROOT_REROOTED bstp: pair1 -> TC_DETECTED bstp: pair1 -> ROOT_AGREED bstp: pair1 -> ROOT_AGREED bstp: pair1 -> ROOT_AGREED bstp: pair1 -> ROOT_AGREED bstp: state changed to FORWARDING on pair0 bstp: pair0 -> TC_LEARNING bstp: state changed to DISCARDING on pair2 bstp: pair2 -> TC_INACTIVE bstp: pair3 role -> DESIGNATED bstp: pair3 -> DESIGNATED_RETIRED bstp: pair3 -> DESIGNATED_SYNCED bstp: pair3 -> DESIGNATED_PROPOSE bstp: pair2 role -> DESIGNATED bstp: pair2 -> DESIGNATED_SYNCED bstp: pair2 -> DESIGNATED_PROPOSE bstp: pair0 -> TC_DETECTED bstp: pair3 role -> ALT/BACK/DISABLED bstp: pair3 -> ALTERNATE_AGREED bstp: pair1 -> TC_TC bstp: pair3 -> ALTERNATE_AGREED bstp: pair1 -> TC_TC bstp: pair3 -> ALTERNATE_AGREED bstp: pair3 -> ALTERNATE_AGREED bstp: pair3 -> ALTERNATE_AGREED bstp: pair3 -> ALTERNATE_AGREED bstp: state changed to FORWARDING on pair2 bstp: pair2 -> TC_LEARNING bstp: pair0 -> TC_LEARNING bstp: pair3 role -> ROOT bstp: pair1 role -> ALT/BACK/DISABLED bstp: state changed to DISCARDING on pair1 bstp: pair1 -> TC_LEARNING bstp: pair3 -> ROOT_REROOT bstp: state changed to LEARNING on pair3 bstp: pair3 -> TC_LEARNING bstp: pair1 -> ALTERNATE_PORT bstp: pair1 -> TC_INACTIVE bstp: state changed to FORWARDING on pair3 bstp: pair3 -> ROOT_REROOTED bstp: pair3 -> TC_DETECTED bstp: state changed to DISCARDING on pair3 bstp: pair3 role -> ALT/BACK/DISABLED bstp: pair3 -> TC_LEARNING bstp: pair1 role -> ROOT bstp: pair3 -> TC_INACTIVE bstp: state changed to DISCARDING on pair2 bstp: pair2 role -> ALT/BACK/DISABLED bstp: pair2 -> TC_INACTIVE bstp: pair1 -> ROOT_REROOT bstp: state changed to LEARNING on pair1 bstp: pair1 -> TC_LEARNING bstp: state changed to FORWARDING on pair1 bstp: pair1 -> ROOT_REROOTED bstp: pair1 -> TC_DETECTED bstp: pair0 -> TC_DETECTED bstp: pair1 -> TC_TC bstp: pair0 -> TC_TC bstp: pair1 -> TC_TC bstp: pair0 -> TC_LEARNING bstp: state changed to DISCARDING on pair3 bstp: pair3 -> TC_INACTIVE bstp: pair2 role -> DESIGNATED bstp: pair2 -> DESIGNATED_SYNCED bstp: state changed to FORWARDING on pair2 bstp: pair2 -> TC_LEARNING bstp: pair3 role -> DESIGNATED bstp: pair3 -> DESIGNATED_SYNCED bstp: pair3 -> DESIGNATED_PROPOSE bstp: pair2 -> TC_DETECTED bstp: pair0 -> TC_LEARNING bstp: pair0 -> TC_DETECTED bstp: pair3 role -> ALT/BACK/DISABLED bstp: pair3 -> ALTERNATE_AGREED bstp: pair3 role
Re: witness report
On 3.6.2018. 13:38, Visa Hankala wrote: > On Sat, Jun 02, 2018 at 03:08:14PM -0700, Philip Guenther wrote: >> On Sat, 2 Jun 2018, Christophe Prévotaux wrote: >>> This a witness report I got on boot with snapshot Jun 1st amd64 >>> >>> root on sd0a (9b49e3196b9bfae8.a) swap on sd0b dump on sd0b >>> lock order reversal: >>> 1st 0xff021cdac180 vmmaplk (>lock) @ >>> /usr/src/sys/uvm/uvm_map.c:4433 >>> 2nd 0xff01dc5f71a8 inode (>i_lock) >> I believe uvm and the vnode layer handle this correctly, with lock tries >> that fall back to releasing the other lock and retrying so progress is >> made. The fix for WITNESS complaining is to mark vmmaplk as a vnode lock. > I think there is an actual issue because the locking calls are > unconditional. FreeBSD appears to work around the problem by unlocking > the vm_map when calling the pager. The diff below adapts that logic > to OpenBSD. > > Because the temporary unlocking may allow another thread to change the > vm_map, the code has to check if the map has been altered since the > unlocking, and if so, handle the case somehow. The patch uses a best > effort approach where the code proceeds from the vm_map entry indicated > by the end address of the current vm_map entry. The sanity checks that > are done at the start of uvm_map_clean() are not rerun. > > The system call that triggers the reversal is msync(2), and the > reversal can be reproduced with the sys/kern/mmap regression test. > sys/kern/mmap3 shows that there is another similar reversal with > mlock(2) which is not covered by the patch. Hi all, with WITNESS I'm getting that similar log and with visa@ diff it's gone. WITNESS log: lock order reversal: 1st 0xff01ef4eb2f0 vmmaplk (>lock) @ /usr/src/sys/uvm/uvm_map.c:4435 2nd 0xff020f58f700 inode (>i_lock) @ /usr/src/sys/ufs/ufs/ufs_vnops.c:1544 lock order ">i_lock"(rrwlock) -> ">lock"(rwlock) first seen at: #0 witness_checkorder+0x4c0 #1 _rw_enter+0x68 #2 vm_map_lock_ln+0xbc #3 uvm_map+0x1a1 #4 km_alloc+0x16a #5 pool_multi_alloc_ni+0xbb #6 pool_p_alloc+0x56 #7 pool_do_get+0xe4 #8 pool_get+0xaf #9 ufsdirhash_build+0x31e #10 ufs_lookup+0x19d #11 VOP_LOOKUP+0x4f #12 vfs_lookup+0x2cf #13 namei+0x2e3 #14 start_init+0xb2 lock order ">lock"(rwlock) -> ">i_lock"(rrwlock) first seen at: #0 witness_checkorder+0x4c0 #1 _rw_enter+0x68 #2 _rrw_enter+0x3e #3 VOP_LOCK+0x3d #4 vn_lock+0x34 #5 uvn_io+0x1b8 #6 uvm_pager_put+0x109 #7 uvn_flush+0x424 #8 uvm_map_clean+0x3e7 #9 syscall+0x32a #10 Xsyscall+0x128 OpenBSD 6.4-current (GENERIC.MP) #7: Mon Nov 5 22:09:07 CET 2018 r...@asd.srce.hr:/sys/arch/amd64/compile/GENERIC.MP real mem = 8456089600 (8064MB) avail mem = 8121876480 (7745MB) mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xe87b1 (86 entries) bios0: vendor Hewlett-Packard version "J01 v02.29" date 04/04/2016 bios0: Hewlett-Packard HP Compaq 8200 Elite CMT PC acpi0 at bios0: rev 2 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP APIC SSDT MCFG HPET SSDT SLIC TCPA acpi0: wakeup devices PS2K(S3) PS2M(S3) BR20(S4) EUSB(S3) USBE(S3) PEX0(S4) PEX1(S4) PEX2(S4) PEX3(S4) PEX4(S4) PEX5(S4) PEX6(S4) PEX7(S4) P0P1(S4) P0P2(S4) P0P3(S4) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz, 3293.52 MHz, 06-2a-07 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz, 3292.56 MHz, 06-2a-07 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz, 3292.56 MHz, 06-2a-07 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 256KB 64b/line 8-way L2
Re: IPv6 NDP Timeout
On 23.10.2018. 21:41, Florian Obser wrote: > I'm currently on vacation and can't look into this soon. > > One thing that comes to mind: do these machines keep proper time or are they > having issues with timer interrupts stopping because of too new KVM version > and missing hypervisor flag (someone with access to a real computer please > chip in with a link to a thread where this has been discussed before and the > name of the KVM flag). > This link? https://marc.info/?l=openbsd-misc=151575775607633=2
Re: kernel page fault - uvm_fault - softclock_thread
On 1.10.2018. 14:09, Visa Hankala wrote: > On Mon, Oct 01, 2018 at 01:50:59PM +0200, Hrvoje Popovski wrote: >> Hi all, >> >> while testing sasha's "pfsync: avoid a recursion on PF_LOCK" diff i >> manage to get panic. first i thought that this panic have something to >> do with sasha@ work but i can easily reproduce it on clean -current. >> >> while firewall is under stress and forwarding traffic and i'm doing this >> in loop >> >> ifconfig pfsync0 destroy && sleep 2 && sh netstart pfsync0 && sleep 2 >> >> i'm getting this panic: >> >> >> uvm_fault(0x81d51fe8, 0x8, 0, 2) -> e >> kernel: page fault trap, code=0 >> Stopped at softclock_thread+0xef: movq%rdx,0x8(%rcx) >> ddb{0}> > > pfsync_clone_destroy() lacks proper locking and its timeout cancellation > is not robust. Please try the patch below. i can't reproduce panic with this diff thank you ..
kernel page fault - uvm_fault - softclock_thread
Hi all, while testing sasha's "pfsync: avoid a recursion on PF_LOCK" diff i manage to get panic. first i thought that this panic have something to do with sasha@ work but i can easily reproduce it on clean -current. while firewall is under stress and forwarding traffic and i'm doing this in loop ifconfig pfsync0 destroy && sleep 2 && sh netstart pfsync0 && sleep 2 i'm getting this panic: uvm_fault(0x81d51fe8, 0x8, 0, 2) -> e kernel: page fault trap, code=0 Stopped at softclock_thread+0xef: movq%rdx,0x8(%rcx) ddb{0}> ddb{0}> show panic kernel page fault uvm_fault(0x81d51fe8, 0x8, 0, 2) -> e softclock_thread(0) at softclock_thread+0xef end trace frame: 0x0, count: 1 ddb{0}> ddb{0}> trace softclock_thread(0) at softclock_thread+0xef end trace frame: 0x0, count: -1 ddb{0}> ddb{0}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 28648 44673 80585 0 30x10008b pause sh 80585 91436 54828 0 30x10008b pause sh 96033 29102 64150 0 30x100083 ttyin ksh 64150 347245 16720 1000 30x10008b pause ksh 16720 469432 43690 1000 30x90 selectsshd 43690 138411 7279 0 30x82 poll sshd 58828 27527 83398 0 30x100083 ttyin ksh 83398 15426 72476 1000 30x10008b pause ksh 72476 428551 71468 1000 30x90 selectsshd 71468 140851 7279 0 30x82 poll sshd 22446 93134 1 0 30x100083 ttyin getty 8308 230314 1 0 30x100083 ttyin getty 54828 192367 1 0 30x10008b pause ksh 23569 400623 1 0 30x100083 ttyin getty 74599 179405 1 0 30x100083 ttyin getty 37720 157979 1 0 30x100083 ttyin getty 58655 119359 1 0 30x100098 poll cron 93238 265393 25253 95 30x100092 kqreadsmtpd 63459 395509 25253103 30x100092 kqreadsmtpd 77659 387006 25253 95 30x100092 kqreadsmtpd 41450 383102 25253 95 30x100092 kqreadsmtpd 69474 212171 25253 95 30x100092 kqreadsmtpd 7791 518306 25253 95 30x100092 kqreadsmtpd 25253 181430 1 0 30x100080 kqreadsmtpd 51069 231291 1 0 30x100080 kqreadsnmpd 93117 241876 1 91 30x100092 kqreadsnmpd 8062 431841 1 91 30x92 kqreadsnmpd 7279 80889 1 0 30x80 selectsshd 3849 318379 82126 83 30x100092 poll ntpd 82126 62754 70140 83 30x100092 poll ntpd 70140 349305 1 0 30x100080 poll ntpd 3780 405249 7687 74 30x100092 bpf pflogd 7687 19138 1 0 30x80 netio pflogd 11361 410273 7613 73 70x100090syslogd 76135709 1 0 30x100082 netio syslogd 83045 500797 0 0 3 0x14200 pgzerozerothread 96104 237415 0 0 3 0x14200 aiodoned aiodoned 95478 105584 0 0 3 0x14200 syncerupdate 7491 247419 0 0 3 0x14200 cleaner cleaner 14850 510159 0 0 3 0x14200 reaperreaper 90803 319870 0 0 3 0x14200 pgdaemon pagedaemon 92554 485066 0 0 3 0x14200 bored crynlk 19382 238999 0 0 3 0x14200 bored crypto 55351 397450 0 0 3 0x14200 usbtskusbtask 94701 370298 0 0 3 0x14200 usbatsk usbatsk 25462 61976 0 0 3 0x40014200 acpi0 acpi0 40756 454275 0 0 7 0x40014200idle5 62029 393821 0 0 3 0x40014200idle4 49221 298303 0 0 7 0x40014200idle3 35942 140312 0 0 3 0x40014200idle2 25913 384078 0 0 3 0x40014200idle1 67573 244162 0 0 3 0x14200 bored sensors 27303 343674 0 0 7 0x14200softnet 80804 305643 0 0 3 0x14200 bored systqmp 44322 381381 0 0 3 0x14200 bored systq *90433 384560 0 0 7 0x40014200softclock 73749 204855 0 0 3 0x40014200idle0 1 182696 0 0 30x82 wait init 0 0 -1 0 3 0x10200 scheduler swapper ddb{0}> tr /p 0t384560 db_ktrap(75cac9a96611906e,8000227567b0,6) at db_ktrap+0xee kerntrap(261d4113102b7957) at kerntrap+0xa0 alltraps_kern(6,804d6850,0,2,81785280,800022756860) at alltraps_kern+0x7b softclock_thread(0) at
Re: Fujitsu RX2530 M4 16 cores null acpi panic
On 15.9.2018. 19:24, Mike Larkin wrote: > On Fri, Sep 14, 2018 at 07:44:45PM +0200, Mark Kettenis wrote: >>> Date: Fri, 14 Sep 2018 10:05:34 -0700 >>> From: Mike Larkin >>> >>> On Thu, Sep 13, 2018 at 11:17:15AM +0200, Hrvoje Popovski wrote: >>>> Hi all, >>>> >>>> i'm having Fujitsu PRIMERGY RX2530 M4 server with Intel Gold 6134 cpu >>>> with 8/16 cores. >>>> When booting box up to 14 cores everything seems fine, but with 16 cores >>>> i'm getting panic. In attachment you can find sendbug. Dmesg in sendbug >>>> is with 14 cores. >>>> >>>> 8 cores (HT disabled) are more than enough for me but maybe this panic >>>> is interesting to developers so i report it ... >>>> >>>> >>>> root on sd0a (fa90dc9ea66a7e54.a) swap on sd0b dump on sd0b >>>> panipanic: l anel pu >>>> gnStopped at db_enter+0x12: popq%r11 >>>> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND >>>> 380736 12442 0 0x14000 0x2005 zerothread >>>> 196414 98064 0 0x14000 0x2007 aiodoned >>>> db_enter() at db_enter+0x12 >>>> panic() at panic+0x120 >>>> acpicpu_idle() at acpicpu_idle+0x2e8 >>>> sched_idle(0) at sched_idle+0x245 >>>> end trace frame: 0x0, count: 11 >>>> https://www.openbsd.org/ddb.html describes the minimum info required in >>>> bug reports. Insufficient info makes it difficult to find and fix bugs. >>>> >>>> >>>> ddb{3}> trace >>>> db_enter() at db_enter+0x12 >>>> panic() at panic+0x120 >>>> acpicpu_idle() at acpicpu_idle+0x2e8 >>> >>> There are only 3 panics in acpicpu_idle. One at the very top: >>> panic ("null acpicpu"); >>> >>> and two much further down: >>> panic ("idle with interrupts blocked"); >>> >>> Based on the fact that it's called at acpicpu_idle+0x2e8, I'm >>> inclined to believe it to be the lattter, but the garbled >>> panic string seems to more closely match the former. >>> >>> Can you put a plain printf before each panic and try to repro, >>> to see which it is? Just printf the same panic string. >> >> Mike, look below: >> >>>> ddb{3}> show panic >>>> null acpicpu >>>> ddb{3}> >> >> So it's the first one. >> >> Hvorje, can you boot this machine with 16 cores but acpicpu(4) >> disabled and send us the acpidump output from /var/db/acpi? > > Ah, oops. Missed that. > > Based on later replies, I think you nailed it with _MAT. > > -ml > Thank you guys and sorry for noise ...
Re: Fujitsu RX2530 M4 16 cores null acpi panic
On 14.9.2018. 19:44, Mark Kettenis wrote: >> Date: Fri, 14 Sep 2018 10:05:34 -0700 >> From: Mike Larkin >> >> On Thu, Sep 13, 2018 at 11:17:15AM +0200, Hrvoje Popovski wrote: >>> Hi all, >>> >>> i'm having Fujitsu PRIMERGY RX2530 M4 server with Intel Gold 6134 cpu >>> with 8/16 cores. >>> When booting box up to 14 cores everything seems fine, but with 16 cores >>> i'm getting panic. In attachment you can find sendbug. Dmesg in sendbug >>> is with 14 cores. >>> >>> 8 cores (HT disabled) are more than enough for me but maybe this panic >>> is interesting to developers so i report it ... >>> >>> >>> root on sd0a (fa90dc9ea66a7e54.a) swap on sd0b dump on sd0b >>> panipanic: l anel pu >>> gnStopped at db_enter+0x12: popq%r11 >>> TIDPIDUID PRFLAGS PFLAGS CPU COMMAND >>> 380736 12442 0 0x14000 0x2005 zerothread >>> 196414 98064 0 0x14000 0x2007 aiodoned >>> db_enter() at db_enter+0x12 >>> panic() at panic+0x120 >>> acpicpu_idle() at acpicpu_idle+0x2e8 >>> sched_idle(0) at sched_idle+0x245 >>> end trace frame: 0x0, count: 11 >>> https://www.openbsd.org/ddb.html describes the minimum info required in >>> bug reports. Insufficient info makes it difficult to find and fix bugs. >>> >>> >>> ddb{3}> trace >>> db_enter() at db_enter+0x12 >>> panic() at panic+0x120 >>> acpicpu_idle() at acpicpu_idle+0x2e8 >> >> There are only 3 panics in acpicpu_idle. One at the very top: >> panic ("null acpicpu"); >> >> and two much further down: >> panic ("idle with interrupts blocked"); >> >> Based on the fact that it's called at acpicpu_idle+0x2e8, I'm >> inclined to believe it to be the lattter, but the garbled >> panic string seems to more closely match the former. >> >> Can you put a plain printf before each panic and try to repro, >> to see which it is? Just printf the same panic string. > > Mike, look below: > >>> ddb{3}> show panic >>> null acpicpu >>> ddb{3}> > > So it's the first one. > > Hvorje, can you boot this machine with 16 cores but acpicpu(4) > disabled and send us the acpidump output from /var/db/acpi? > Hi, here it is: http://kosjenka.srce.hr/~hrvoje/zaprocvat/noacpi.tgz dmesg without acpicpu, just in case .. OpenBSD 6.4-beta (GENERIC.MP) #293: Tue Sep 11 20:16:57 MDT 2018 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 33731620864 (32168MB) avail mem = 32700067840 (31185MB) User Kernel Config UKC> dia\^H \^Hsable acpicpu 402 acpicpu* disabled UKC> exit Continuing... mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x6f119000 (84 entries) bios0: vendor FUJITSU // American Megatrends Inc. version "V5.0.0.12 R1.22.0 for D3383-A1x" date 06/04/2018 bios0: FUJITSU PRIMERGY RX2530 M4 acpi0 at bios0: rev 2 acpi0: sleep states S0 S5 acpi0: tables DSDT FACP FPDT FIDT SPMI UEFI UEFI MCEJ MCFG HPET APIC MIGT MSCT NFIT PCAT PCCT RASF SLIT SRAT SVOS WDDT OEM4 OEM1 OEM2 SSDT SSDT SSDT DMAR HEST BERT ERST EINJ acpi0: wakeup devices PWRB(S0) XHCI(S0) PXSX(S0) RP17(S0) PXSX(S0) RP18(S0) PXSX(S0) RP19(S0) PXSX(S0) RP20(S0) PXSX(S0) RP01(S0) PXSX(S0) RP02(S0) PXSX(S0) RP03(S0) [...] acpitimer0 at acpi0: 3579545 Hz, 24 bits acpimcfg0 at acpi0 acpimcfg0: addr 0x8000, bus 0-255 acpihpet0 at acpi0: 2399 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz, 3193.22 MHz, 06-55-04 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,PQM,MPX,AVX512F,AVX512DQ,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,PT,AVX512CD,AVX512BW,AVX512VL,PKU,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 24MHz cpu0: mwait min=64, max=64, C-substates=0.2.0.2, IBE cpu1 at mainbus0: apid 8 (application processor) cpu1: Intel(R) Xeon(R) Gold 6134 CPU @ 3.20GHz, 3192.53 MHz, 06-55-04 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,S
Re: Kernel Panic 6.3 and HP DL360 Gen9
On 19.6.2018. 15:14, Albert Martinez wrote: > Dear OpenBSD Team, > > first of all, thanks for your time and effort in the OBSD project, we use it > daily. Hi, please send dmesg and if you can install latest -current and report back.
Re: lock order reversal
On 30.1.2018. 13:34, Martin Pieuchot wrote: > On 30/01/18(Tue) 13:12, Hrvoje Popovski wrote: >> Hi all, >> >> I've checkouted cvs tree few minutes ago on desktop pc and enabled >> WITNESS. While booting pc with new kernel i'm getting "lock order reversal" >> >> http://kosjenka.srce.hr/~hrvoje/zaprocvat/IMG_20180130_125928.jpg > > I'd say this is known. With a working keyboard you could print the > traces and run "show witness /b". > >> >> in ddb i can't do nothing with keyboard > > Putting machdep.forceukbd=1 into sysctl.conf(5) should help with that. now without uefi output to serial port is working :) login: lock order reversal: 1st 0x81bf6aa8 _lock (_lock) @ /sys/kern/kern_synch.c:292 2nd 0x80218240 _priv->irq_lock (_priv->irq_lock) @ /sys/dev/pc i/drm/i915/intel_ringbuffer.c:1787 Stopped at db_enter+0x5: popq%rbp ddb{2}> ddb{2}> trace db_enter() at db_enter+0x5 witness_checkorder(81886033,6fb,80218240,80218230,0) at witness_checkorder+0xaf2 _mtx_enter(80211000,80218230,8021b000) at _mtx_enter+0x 30 gen6_ring_put_irq(800033083370) at gen6_ring_put_irq+0x36 __i915_wait_request(0,ff0219e0d008,80ade850,0,80211000) at _ _i915_wait_request+0x330 i915_wait_request(0) at i915_wait_request+0x87 i915_gem_object_wait_rendering(0,80ade740) at i915_gem_object_wait_rend ering+0x18e i915_gem_object_set_cache_level(8000330834e0,80ade740) at i915_gem_ object_set_cache_level+0x20d i915_gem_object_pin_to_display_plane(80ade740,80211000,8000 330834e0,0,8130d56d) at i915_gem_object_pin_to_display_plane+0x6d intel_pin_and_fence_fb_obj() at intel_pin_and_fence_fb_obj+0x1c0 intel_crtc_page_flip(80a80a00,8021c000,800033083ab0,800 00021b000) at intel_crtc_page_flip+0x4dd drm_mode_page_flip_ioctl(8021b000,8171b440,c01864b0) at drm_mod e_page_flip_ioctl+0x39b drm_do_ioctl(0,8021b0d8,8021b000,8021b108) at drm_do_io ctl+0x201 drmioctl(ff01dfe55ae0,800033084030,ff01ddaf4e88,c01864b0,ff01dd af4e88) at drmioctl+0xe8 VOP_IOCTL(6e8804fd365ed36d,800033084030,ff021e5d3ba0,3,800033083ab0 ,c01864b0) at VOP_IOCTL+0x3e vn_ioctl(800033084030,800033083ba0,ff01ddaf4e88,18) at vn_ioctl+0x5 d sys_ioctl(360,800033084030,0) at sys_ioctl+0x343 syscall() at syscall+0x279 --- syscall (number 54) --- end of kernel end trace frame: 0x7f7f2e80, count: -18 0x1dc9e137593a: ddb{2}> ddb{2}> show witness /b Number of known direct relationships is 300 Lock order reversal between "_lock"(sched_lock) and "_priv->irq_lock" (mutex)! Lock order "_lock"(sched_lock) -> "_priv->irq_lock"(mutex) first seen at: #0 witness_checkorder+0x466 #1 _mtx_enter+0x30 #2 gen6_ring_put_irq+0x36 #3 __i915_wait_request+0x330 #4 i915_wait_request+0x87 #5 i915_gem_object_wait_rendering+0x18e #6 i915_gem_object_set_cache_level+0x20d #7 i915_gem_object_pin_to_display_plane+0x6d #8 intel_pin_and_fence_fb_obj+0x1c0 #9 intel_crtc_page_flip+0x4dd #10 drm_mode_page_flip_ioctl+0x39b #11 drm_do_ioctl+0x201 #12 drmioctl+0xe8 #13 VOP_IOCTL+0x3e #14 vn_ioctl+0x5d #15 sys_ioctl+0x343 #16 syscall+0x279 Lock order "_priv->irq_lock"(mutex) -> "_lock"(sched_lock) first seen at: #0 witness_checkorder+0x466 #1 ___mp_lock+0x6f #2 wakeup_n+0x39 #3 task_add+0x85 #4 gen6_rps_boost+0x110 #5 __i915_wait_request+0x13c #6 i915_gem_object_wait_rendering__nonblocking+0x1c6 #7 i915_gem_set_domain_ioctl+0xce #8 drm_do_ioctl+0x201 #9 drmioctl+0xe8 #10 VOP_IOCTL+0x3e #11 vn_ioctl+0x5d #12 sys_ioctl+0x343 #13 syscall+0x279 Lock order reversal between ">mnt_lock"(rwlock) and ">i_lock"(rrwlock)! Lock order ">mnt_lock"(rwlock) -> ">i_lock"(rrwlock) first seen at: #0 witness_checkorder+0x466 #1 _rw_enter+0x56 #2 _rrw_enter+0x32 #3 VOP_LOCK+0x31 #4 vn_lock+0x36 #5 vget+0xba #6 cache_lookup+0x173 #7 ufs_lookup+0x10e #8 VOP_LOOKUP+0x33 #9 vfs_lookup+0x26e #10 namei+0x1eb #11 ffs_mount+0x111 #12 sys_mount+0x33c #13 syscall+0x279 Lock order ">i_lock"(rrwlock) -> ">mnt_lock"(rwlock) first seen at: #0 witness_checkorder+0x466 #1 _rw_enter+0x56 #2 vfs_busy+0x64 #3 vfs_lookup+0x38b #4 namei+0x1eb #5 doreadlinkat+0x6d #6 syscall+0x279 ddb{2}> ddb{2}> mach ddbcpu 0 Stopped at x86_ipi_db+0x5: popq%rbp ddb{0}> trace x86_ipi_db(802119e0) at x86_ipi_db+0x5 x86_ipi_handler() at x86_ipi_handler+0x6a Xresume_lapic_ipi() at Xresume_lapic_ipi+0x1f --- interrupt --- end of kernel end trace frame: 0x33a79405c75250cc, count: -3 0x41cb8c419c524153: ddb{0}> mach ddbcpu 1 Stopped at x86_ipi_db+0x5: popq%rbp ddb{1}> trace x86_i
Re: lock order reversal
On 30.1.2018. 14:18, Hrvoje Popovski wrote: > On 30.1.2018. 13:34, Martin Pieuchot wrote: >> On 30/01/18(Tue) 13:12, Hrvoje Popovski wrote: >>> Hi all, >>> >>> I've checkouted cvs tree few minutes ago on desktop pc and enabled >>> WITNESS. While booting pc with new kernel i'm getting "lock order reversal" >>> >>> http://kosjenka.srce.hr/~hrvoje/zaprocvat/IMG_20180130_125928.jpg >> >> I'd say this is known. With a working keyboard you could print the >> traces and run "show witness /b". >> >>> >>> in ddb i can't do nothing with keyboard >> >> Putting machdep.forceukbd=1 into sysctl.conf(5) should help with that. > > > thank you for info on forceukdb=1 > > trace > http://kosjenka.srce.hr/~hrvoje/zaprocvat/IMG_20180130_140812.jpg > > show witness /b - can i somehow scroll up/down in ddb? show withess /b > doesn't print page by page link show witness > > this is so easier over serial console :) > > i will reinstall pc without uefi boot and send proper ddb output ..
Re: lock order reversal
On 30.1.2018. 13:34, Martin Pieuchot wrote: > On 30/01/18(Tue) 13:12, Hrvoje Popovski wrote: >> Hi all, >> >> I've checkouted cvs tree few minutes ago on desktop pc and enabled >> WITNESS. While booting pc with new kernel i'm getting "lock order reversal" >> >> http://kosjenka.srce.hr/~hrvoje/zaprocvat/IMG_20180130_125928.jpg > > I'd say this is known. With a working keyboard you could print the > traces and run "show witness /b". > >> >> in ddb i can't do nothing with keyboard > > Putting machdep.forceukbd=1 into sysctl.conf(5) should help with that. thank you for info on forceukdb=1 trace http://kosjenka.srce.hr/~hrvoje/zaprocvat/IMG_20180130_140812.jpg show witness /b - can i somehow scroll up/down in ddb? show withess /b doesn't print page by page link show witness this is so easier over serial console :)