Thanks a lot for checking, Itai!
As discussed offline, we made another attempt with 5.19.0-28-generic
kernel and 22.35.2302 firmware on a different system, and also did not
run into this issue there.
Will set this to incomplete until we regain access to the system where
this was first observed so we can compare sw/hw components.
** Changed in: linux (Ubuntu)
Status: Confirmed => Incomplete
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1999229
Title:
mlx5 VF LAG flapping
Status in linux package in Ubuntu:
Incomplete
Bug description:
# sudo lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.1 LTS
Release: 22.04
Codename: jammy
# mlxfwmanager
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX6DX
Part Number: MCX623106AN-CDA_Ax
Description: ConnectX-6 Dx EN adapter card; 100GbE; Dual-port QSFP56;
PCIe 4.0/3.0 x16;
PSID: MT_0000000359
PCI Device Name: 0000:41:00.0
Base GUID: 08c0eb03006fb26e
Base MAC: 08c0eb6fb26e
Versions: Current Available
FW 22.34.4000 N/A
PXE 3.6.0700 N/A
UEFI 14.27.0015 N/A
# uname -a
Linux ps6-ra1-n2 5.19.0-24-generic #25~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri
Nov 18 14:28:08 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Kernel from linux-generic-hwe-22.04-edge package in jammy-proposed,
see https://wiki.ubuntu.com/Testing/EnableProposed for documentation
how to enable and use -proposed.
Problem:
Severe packet loss to high speed NIC due to what appears as VF LAG flapping:
[Fri Dec 9 07:27:19 2022] mlx5_core 0000:41:00.0: mlx5_cmd_out_err:778:(pid
3383): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad
resource(0x5), syndrome (0xf2ff71), err(-22)
[Fri Dec 9 07:27:19 2022] mlx5_core 0000:41:00.0: E-Switch: Failed to create
termination table rule, err -EINVAL
[Fri Dec 9 07:27:19 2022] mlx5_core 0000:41:00.0: E-Switch: Failed to get
termination table, err -EINVAL
[Fri Dec 9 07:27:19 2022] mlx5_core 0000:41:00.1: mlx5_cmd_out_err:778:(pid
3383): SET_FLOW_TABLE_ENTRY(0x936) op_mod(0x0) failed, status bad
resource(0x5), syndrome (0xf2ff71), err(-22)
[Fri Dec 9 07:27:19 2022] mlx5_core 0000:41:00.1: E-Switch: Failed to create
termination table rule, err -EINVAL
[Fri Dec 9 07:27:19 2022] mlx5_core 0000:41:00.1: E-Switch: Failed to get
termination table, err -EINVAL
[Fri Dec 9 07:27:20 2022] mlx5_core 0000:41:00.0: lag map active ports: 1, 2
[Fri Dec 9 07:27:20 2022] mlx5_core 0000:41:00.0: lag map active ports: 2
[Fri Dec 9 07:27:20 2022] mlx5_core 0000:41:00.0: lag map active ports: 1, 2
[Fri Dec 9 07:27:21 2022] mlx5_core 0000:41:00.0: lag map active ports: 2
[Fri Dec 9 07:27:21 2022] mlx5_core 0000:41:00.0: lag map active ports: 1, 2
[Fri Dec 9 07:27:22 2022] mlx5_core 0000:41:00.0: lag map active ports: 2
[Fri Dec 9 07:27:23 2022] mlx5_core 0000:41:00.0: lag map active ports: 1, 2
[Fri Dec 9 07:27:23 2022] mlx5_core 0000:41:00.0: lag map active ports: 2
This does not happen when using the Jammy 5.15 kernel, everything else
in the environment being equal.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1999229/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp