1) Andre, after I switched to active-backup the issue is gone (so far).
But yeah, we are looking for a reproducer as well. It's hard to narrow
down some random issue - also likely for Intel.
2) But I just received an email from an Intel developer with a suggested
change to the driver to narrow down the issue further. I quote ...
--- cut ---
Could you edit file (from kernel source tree base)
drivers/net/ethernet/intel/ice/ice_lag.c .
Then find the functions ice_init_lag()and ice_deinit_lag().
Then add this line to the beginning of the functions
return 0; and return; respectively.
the patch nomenclature would look something like this:
* Memory will be freed in ice_deinit_lag
*/
int ice_init_lag(struct ice_pf *pf)
{
struct device *dev = ice_pf_to_dev(pf);
struct ice_lag *lag;
struct ice_vsi *vsi;
int err;
+ return 0;
pf->lag = kzalloc(sizeof(*lag), GFP_KERNEL);
if (!pf->lag)
return -ENOMEM;
lag = pf->lag;
………
* This function is meant to only be called on driver remove/shutdown
*/
void ice_deinit_lag(struct ice_pf *pf)
{
struct ice_lag *lag;
+ return;
lag = pf->lag;
Then re-build the driver and try to reproduce the problem?
--- cut ---
So in essence I believe this just skips offloading the bonding / LACP to the HW.
I will set this up on one or two of our machines to test. Would you please also
try this on your systems?
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2036239
Title:
Intel E810-XXV - NETDEV WATCHDOG: (ice): transmit queue timed out
Status in linux package in Ubuntu:
Confirmed
Bug description:
I'm having issues with an Intel E810-XXV card on a Dell server under Ubuntu
Jammy.
Details:
- hardware --> a1:00.0 Ethernet controller: Intel Corporation Ethernet
Controller E810-XXV for SFP (rev 02)
- tested with both GA and HWE kernels (`5.15.0-83-generic #92` and
`6.2.0-32-generic #32~22.04.1-Ubuntu`) with the same results.
- using a bond over the two ports of the same card, at 25Gbps to two
different switches, bond is using LACP with hash layer3+4 and fast
timeout. But I believe the bug is not directly related to bonding as
the problem seems to be in the interface.
- machine installed by maas. No issues during installation, but at
that time bond is not formed yet, later when linux is booted, the bond
is formed and works without issues for a while
- it works for about 2 to 3 hours fine, then the issue starts (may or
may not be related to network load, but it seems that it is triggered
by some tests that I run after openstack finishes installing)
- one of the legs of the bond freezes and everything that would go to
that lag is discarded, in and out, ping to random external hosts start
losing every second packet
- after some time you can see on the kernel log messages about "NETDEV
WATCHDOG: enp161s0f0 (ice): transmit queue 166 timed out" and a stack
trace
- the switch does log that the bond is flapping
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Sep 12 20:05 seq
crw-rw---- 1 root audio 116, 33 Sep 12 20:05 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq',
'/dev/snd/timer'] failed with exit code 1:
CRDA: N/A
CasperMD5CheckResult: pass
CloudArchitecture: x86_64
CloudID: none
CloudName: none
CloudPlatform: none
CloudSubPlatform: config
DistroRelease: Ubuntu 22.04
InstallationDate: Installed on 2023-08-22 (24 days ago)
InstallationMedia: Ubuntu-Server 22.04.3 LTS "Jammy Jellyfish" - Release
amd64 (20230810)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Dell Inc. PowerEdge R7515
Package: linux (not installed)
PciMultimedia:
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.15.0-83-generic
root=UUID=cfb5f171-77e6-4fcd-947b-52901f51b26a ro
ProcVersionSignature: Ubuntu 5.15.0-83.92-generic 5.15.116
RelatedPackageVersions:
linux-restricted-modules-5.15.0-83-generic N/A
linux-backports-modules-5.15.0-83-generic N/A
linux-firmware 20220329.git681281e4-0ubuntu3.18
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy uec-images
Uname: Linux 5.15.0-83-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 07/27/2023
dmi.bios.release: 2.12
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.12.4
dmi.board.name: 0J91V2
dmi.board.vendor: Dell Inc.
dmi.board.version: A01
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias:
dmi:bvnDellInc.:bvr2.12.4:bd07/27/2023:br2.12:svnDellInc.:pnPowerEdgeR7515:pvr:rvnDellInc.:rn0J91V2:rvrA01:cvnDellInc.:ct23:cvr:skuSKU=08FD;ModelName=PowerEdgeR7515:
dmi.product.family: PowerEdge
dmi.product.name: PowerEdge R7515
dmi.product.sku: SKU=08FD;ModelName=PowerEdge R7515
dmi.sys.vendor: Dell Inc.
---
ProblemType: Bug
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Sep 15 03:13 seq
crw-rw---- 1 root audio 116, 33 Sep 15 03:13 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu82.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse:
Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with
exit code 1: Cannot stat file /proc/215602/fd/10: Permission denied
Cannot stat file /proc/323635/fd/10: Permission denied
CRDA: N/A
CasperMD5CheckResult: unknown
CloudArchitecture: x86_64
CloudID: maas
CloudName: maas
CloudPlatform: maas
CloudSubPlatform: seed-dir (http://10.3.4.7:5248/MAAS/metadata/)
DistroRelease: Ubuntu 22.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
MachineType: Dell Inc. PowerEdge R7525
NonfreeKernelModules: zfs zunicode zavl icp zcommon znvpair
Package: linux (not installed)
PciMultimedia:
ProcFB: 0 mgag200drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-6.2.0-32-generic
root=UUID=9b437790-e6e2-4a2e-af79-5b13fee932af ro
ProcVersionSignature: Ubuntu 6.2.0-32.32~22.04.1-generic 6.2.16
RebootRequiredPkgs: Error: path contained symlinks.
RelatedPackageVersions:
linux-restricted-modules-6.2.0-32-generic N/A
linux-backports-modules-6.2.0-32-generic N/A
linux-firmware 20220329.git681281e4-0ubuntu3.18
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: jammy uec-images
Uname: Linux 6.2.0-32-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: N/A
_MarkForUpload: True
dmi.bios.date: 07/26/2023
dmi.bios.release: 2.12
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 2.12.4
dmi.board.name: 03WYW4
dmi.board.vendor: Dell Inc.
dmi.board.version: A02
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias:
dmi:bvnDellInc.:bvr2.12.4:bd07/26/2023:br2.12:svnDellInc.:pnPowerEdgeR7525:pvr:rvnDellInc.:rn03WYW4:rvrA02:cvnDellInc.:ct23:cvr:skuSKU=08FF;ModelName=PowerEdgeR7525:
dmi.product.family: PowerEdge
dmi.product.name: PowerEdge R7525
dmi.product.sku: SKU=08FF;ModelName=PowerEdge R7525
dmi.sys.vendor: Dell Inc.
mtime.conffile..etc.logrotate.d.apport: 2023-09-15T13:17:01.203771
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036239/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp