On Thu, Dec 14, 2023 at 11:22:27PM +0000, John Clendenen wrote:
> >Synopsis: Panics occuring about once every 24 hours since 7.4 upgrade.
> Appears to be network related. Disabling LRO on ix devices helps.
> >Category: amd64
> >Environment:
> System      : OpenBSD 7.4
> Details     : OpenBSD 7.4 (GENERIC.MP) #2: Fri Dec  8 15:39:04 MST 2023
> r...@syspatch-74-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/
> GENERIC.MP
> 
> Architecture: OpenBSD.amd64
> Machine     : amd64
> >Description:
> We have 2 Supermicro 5018D-FN8T systems serving as HA gateways at a
> colocation. They run a moderately complex network stack including veb,
> trunk, vlan, carp and wireguard interfaces across ix and em hardware. This
> configuration was deployed on 7.2 and upgraded to 7.3 soon after its
> release. The configuration has not changed much since and the systems have
> been very reliable. After upgrading to 7.4, we see kernel panics about once
> a day. We have re-imaged OpenBSD 7.3 on one of the units while we continue
> to test 7.4 on the 2nd. Note that the panics are not consistent. The most
> recent one involves memcpy, pf and wireguard, but previous ones have
> involved the ix driver. This is the first bug report I've had to file for
> OpenBSD, so please forgive my inexperience. I have screenshots of the most
> recent panic and trace. Happy to provide more info as needed.

What combination of em(4), ix(4), veb(3), trunk(4), vlan(4), carp(4) and
wg(4) do you use?

Could you provide your /etc/hostname.* files?

> I couldn't determine exactly where to run the objdump per the ddb
> documentation.
> 
> >How-To-Repeat:
> Not strictly repeatable but will occur roughly once a day while under
> moderate to heavy network load.
> 
> >Fix:
> The first panics involved the ix driver so our first idea was to disable
> the TSO sysctl and LRO on the ix interfaces since those were changed in 7.4
> (note that we had not manually enabled either). This did have a positive
> impact in that panics stopped, but there were still errant behaviors. In
> the first case, traffic routed through the ix trunk was unreliable (about
> 50% ping), however traffic outside of ix interfaces was fine and local
> traffic to/from hosts on the ix networks was also fine. In the second case,
> all traffic appeared normal, but the ix interfaces suddenly changed to no
> carrier (at about the same frequency as the panics). The no carrier status
> appeared to be in error as the switches still showed the interfaces as up.
> The no carrier status also persisted through reboots but would clear on
> power cycle. The 2 behaviors were seen in the 2 different systems (system 1
> exhibited the first behavior and system 2 exhibited the second). This was
> especially confusing because the hardware is identical, and the
> configuration is as similar as you'd expect in a CARP failover situation.

Reply via email to