Edwin, Do you happen to notice any IPv6 or LLDP or other link-local traffic on the interfaces? (including backup interface).
The MTR loss % is purely a capture of their packets xmitted and responses received, so for that UDP MTR test, this is saying that UDP packets were lost, somewhere. The NIC does not have any drops showing via ethtool -S stats but I'm hunting down which are the right pair of before/afters. Other than the tpa_abort counts, there were no errors that I saw. I can't tell what the tpa_abort means for the frame - is it purely a failure only to coalesce, or does it end up dropping packets at some point in that functionality? I'm assuming not, as whatever the reason, those would be counted as drops, I hope, and printed in the interface stats. I'll attach all the stats here once I get them sorted out, I thought I had a clean diff of before and after from the tester, but after looking through, I don't think the file I have is from before/after the mtr test, as there was negligible UDP traffic. I'll try and get clarification from the reporter. Note that when the provision of primary= is used to configure which interface is primary, and when the primary port is used as the active interface for the bond, no problems are seen (and that works deterministically to set the correct active interface). -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1853638 Title: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data Status in linux package in Ubuntu: Confirmed Status in network-manager package in Ubuntu: Confirmed Bug description: The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data Basically, we are dropping data, as you can see from the benchmark tool as follows: tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 10e6 --tx_rate 10e6 --duration 300 [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; UHD_3.14.1.1-0-g98c7c986 [WARNING] [UHD] Unable to set the thread priority. Performance may be negatively affected. Please see the general application notes in the manual for instructions. EnvironmentError: OSError: error in pthread_setschedparam [00:00:00.000007] Creating the usrp device with: ... [INFO] [X300] X300 initialization sequence... [INFO] [X300] Maximum frame size: 1472 bytes. [INFO] [X300] Radio 1x clock: 200 MHz [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D00000000000) [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s) [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s) [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD100000000001) [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD100000000001) [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0000000000000) [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0000000000000) [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0000000000000) [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0000000000000) Using Device: Single USRP: Device: X-Series Device Mboard 0: X310 RX Channel: 0 RX DSP: 0 RX Dboard: A RX Subdev: SBX-120 RX RX Channel: 1 RX DSP: 0 RX Dboard: B RX Subdev: SBX-120 RX TX Channel: 0 TX DSP: 0 TX Dboard: A TX Subdev: SBX-120 TX TX Channel: 1 TX DSP: 0 TX Dboard: B TX Subdev: SBX-120 TX [00:00:04.305374] Setting device timestamp to 0... [WARNING] [UHD] Unable to set the thread priority. Performance may be negatively affected. Please see the general application notes in the manual for instructions. EnvironmentError: OSError: error in pthread_setschedparam [00:00:04.310990] Testing receive rate 10.000000 Msps on 1 channels [WARNING] [UHD] Unable to set the thread priority. Performance may be negatively affected. Please see the general application notes in the manual for instructions. EnvironmentError: OSError: error in pthread_setschedparam [00:00:04.318356] Testing transmit rate 10.000000 Msps on 1 channels [00:00:06.693119] Detected Rx sequence error. D[00:00:09.402843] Detected Rx sequence error. DD[00:00:40.927978] Detected Rx sequence error. D[00:01:44.982243] Detected Rx sequence error. D[00:02:11.400692] Detected Rx sequence error. D[00:02:14.805292] Detected Rx sequence error. D[00:02:41.875596] Detected Rx sequence error. D[00:03:06.927743] Detected Rx sequence error. D[00:03:47.967891] Detected Rx sequence error. D[00:03:58.233659] Detected Rx sequence error. D[00:03:58.876588] Detected Rx sequence error. D[00:04:03.139770] Detected Rx sequence error. D[00:04:45.287465] Detected Rx sequence error. D[00:04:56.425845] Detected Rx sequence error. D[00:04:57.929209] Detected Rx sequence error. [00:05:04.529548] Benchmark complete. Benchmark rate summary: Num received samples: 2995435936 Num dropped samples: 4622800 Num overruns detected: 0 Num transmitted samples: 3008276544 Num sequence errors (Tx): 0 Num sequence errors (Rx): 15 Num underruns detected: 0 Num late commands: 0 Num timeouts (Tx): 0 Num timeouts (Rx): 0 Done! tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ In this particular case description, the nodes are USRP x310s. However, we have the same issue with N210 nodes dropping samples connected to the BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device. There is no problem with the USRPs themselves, as we have tested them with normal 1G network cards and have no dropped samples. Personally I think its something to do with the 10G network card, possibly on a ubuntu driver??? Note, Dell have said there is no hardware problem with the 10G interfaces I have followed the troubleshooting information on this link to try determine the problem: https://files.ettus.com/manual/page_usrp_x3x0_config.html - There is no firewall on that port (disabled). - I tried setting the cpu frequency power but got "no or unknown cpufreq driver is active on this CPU". - I also changed the cable to Cat6a connecting the USRPs to the 10G SRIOV port, and I get the same issue This is from the VM with connected USRP x310 tcdforge@x310a:~$ lspci -nn | grep -i ethernet 00:03.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000] 00:05.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries NetXtreme-E Ethernet Virtual Function [14e4:16dc] tcdforge@x310a:~$ 5e:00.0 Ethernet controller [0200]: Broadcom Inc. and subsidiaries BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller [14e4:16d8] (rev 01) Subsystem: Dell BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller [1028:1fea] Flags: bus master, fast devsel, latency 0, IRQ 50, NUMA node 0 Memory at b9a10000 (64-bit, prefetchable) [size=64K] Memory at b9100000 (64-bit, prefetchable) [size=1M] Memory at b9aa2000 (64-bit, prefetchable) [size=8K] Expansion ROM at b9c00000 [disabled] [size=512K] Capabilities: <access denied> Kernel driver in use: bnxt_en Kernel modules: bnxt_en We get this info from the server: scamallra@rack9:~$ cpupower frequency-info analyzing CPU 0: no or unknown cpufreq driver is active on this CPU CPUs which run at the same hardware frequency: Not Available CPUs which need to have their frequency coordinated by software: Not Available maximum transition latency: Cannot determine or is not supported. Not Available available cpufreq governors: Not Available Unable to determine current policy current CPU frequency: Unable to call hardware current CPU frequency: Unable to call to kernel boost state support: Supported: yes Active: yes lsb_release -rd Description: Ubuntu 18.04.3 LTS Release: 18.04 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: network-manager 1.10.6-2ubuntu1.1 ProcVersionSignature: Ubuntu 4.15.0-70.79-generic 4.15.18 Uname: Linux 4.15.0-70-generic x86_64 ApportVersion: 2.20.9-0ubuntu7.9 Architecture: amd64 Date: Fri Nov 22 17:39:21 2019 NetworkManager.state: [main] NetworkingEnabled=true WirelessEnabled=true WWANEnabled=true ProcEnviron: TERM=xterm-256color PATH=(custom, no user) XDG_RUNTIME_DIR=<set> LANG=en_US.UTF-8 SHELL=/bin/bash RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: network-manager UpgradeStatus: No upgrade log present (probably fresh install) nmcli-con: NAME UUID TYPE TIMESTAMP TIMESTAMP-REAL AUTOCONNECT AUTOCONNECT-PRIORITY READONLY DBUS-PATH ACTIVE DEVICE STATE ACTIVE-PATH SLAVE nmcli-nm: RUNNING VERSION STATE STARTUP CONNECTIVITY NETWORKING WIFI-HW WIFI WWAN-HW WWAN running 1.10.6 connected started unknown enabled enabled enabled enabled enabled To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp