Given that we're could be changing reset behavior that might be expected
from the firmware, I wrote a quick set of kprobes to force the firmware
to raise MDD events and test out the patched kernel from the PPA.

I tried to force faulty TX descriptors according to "Table 7-138. Tx
Descriptor Validity Checks" in the XL710 Datasheet, under section
"7.6.2.2.1 Interrupt on Misbehavior of VM (Malicious Driver Detection)".
This document is publicly available at Intel's Technical Library site
for this NIC.

The test setup is as follows:
- Create 2 VFs on primary NIC
- Passthrough VF 1 to a Bionic VM
- Start iperf3 client on VM, going through i40evf interface
- Start another iperf3 client on host, going through i40e interface

The iperf3 servers in my testing were running on a separate host, so I
only had clients using the i40e NIC. This was primarily to verify what
the networking and connectivity impact would be if we ran into any MDDs.

After both iperf3 clients were running, I loaded the kprobe modules
according to a specific TX check to validate. Raising MDDs on the VF
turned out to be pretty trivial, and most of the i40e probes also work
on i40evf. MDDs on the PF were a bit more tricky to get, but I had good
results with corrupting the final TX descriptor's cmd_type_offset_bsz
field. As this happens right before the driver notifies the NIC about
the new data, it should force the firmware to raise the MDD event, as
opposed to us "manually" triggering it from the driver. This has the
benefit of keeping things consistent from the firmware's point of view,
as in the end it is the one responsible for detecting and notifying the
kernel about those events.

The primary point with this test was to verify whether we could leave
the NIC in an inconsistent state, by avoiding or delaying the PF reset.
The results were promising, and should hopefully give some more data on
the value of the upstream patch.

When raising MDDs on the VF, the firmware correctly slaps the
appropriate queues and schedules any resets as required. This is the
same behavior as before. With the test kernel however, we don't issue
any resets to the PF, so the iperf3 tests continue running uninterrupted
as desired.

When raising MDDs on the PF, we don't issue any resets anymore and
depending on what probe was used, connectivity will stop momentarily.
The netdev watchdog kicks in shortly afterwards, and issues a PF reset
as appropriate, and network connectivity resumes. This confirms that
even with the upstream patch any hung queues that don't reset
immediately will recover afterwards, as the queue watchdogs will take
care of those. This is consistent with the upstream behavior, and the
kernel logs look similar as to the one below:

[  573.279608] NETDEV WATCHDOG: ens1f1 (i40e): transmit queue 1 timed out
[  573.279652] WARNING: CPU: 14 PID: 0 at 
/build/linux-lqvoqZ/linux-4.15.0/net/sched/sch_generic.c:323 
dev_watchdog+0x221/0x230
[  573.279659] Modules linked in: vhost_net vhost tap vfio_pci vfio_virqfd 
vfio_iommu_type1 vfio i40evf xt_CHECKSUM iptable_mangle ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp 
ebtable_filter devlink ebtables nls_iso8859_1 intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp intel_cstate 
intel_rapl_perf lpc_ich hpilo ipmi_si ipmi_devintf ipmi_msghandler shpchp 
ioatdma acpi_power_meter mac_hid sch_fq_codel kvm_intel kvm irqbypass 
iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc 
arp_tables ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov
[  573.279726]  async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c 
raid1 raid0 multipath linear ses enclosure crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel pcbc mgag200 i2c_algo_bit ttm aesni_intel aes_x86_64 
crypto_simd glue_helper cryptd drm_kms_helper syscopyarea ixgbe sysfillrect 
sysimgblt fb_sys_fops dca i40e drm tg3 ptp nvme hpsa pps_core nvme_core mdio 
scsi_transport_sas wmi [last unloaded: probe_tx_desc]
[  573.279756] CPU: 14 PID: 0 Comm: swapper/14 Tainted: G           OE    
4.15.0-137-generic #141+TEST298651v20210225b1-Ubuntu
[  573.279757] Hardware name: HP ProLiant DL360 Gen9, BIOS P89 05/06/2015
[  573.279763] RIP: 0010:dev_watchdog+0x221/0x230
[  573.279764] RSP: 0018:ffff8f28bf183e58 EFLAGS: 00010286
[  573.279766] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000000000083f
[  573.279767] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f
[  573.279769] RBP: ffff8f28bf183e88 R08: 0000000000000694 R09: 0000000000000004
[  573.279770] R10: ffff8f28bf183ee0 R11: 0000000000000001 R12: 0000000000000040
[  573.279772] R13: ffff8f2827c69000 R14: ffff8f2827c69478 R15: ffff8f2827fa4f40
[  573.279774] FS:  0000000000000000(0000) GS:ffff8f28bf180000(0000) 
knlGS:0000000000000000
[  573.279775] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  573.279777] CR2: 000055d7e4f1b0e8 CR3: 0000000e9160a006 CR4: 00000000001626e0
[  573.279778] Call Trace:
[  573.279781]  <IRQ>
[  573.279790]  ? dev_deactivate_queue.constprop.33+0x60/0x60
[  573.279795]  call_timer_fn+0x30/0x130
[  573.279799]  run_timer_softirq+0x3f3/0x430
[  573.279805]  ? ktime_get+0x43/0xb0
[  573.279813]  ? lapic_next_deadline+0x26/0x30
[  573.279820]  __do_softirq+0xe4/0x2d4
[  573.279827]  irq_exit+0xc5/0xd0
[  573.279831]  smp_apic_timer_interrupt+0x79/0x140
[  573.279835]  apic_timer_interrupt+0x90/0xa0
[  573.279838]  </IRQ>
[  573.279847] RIP: 0010:mwait_idle+0x9f/0x190
[  573.279849] RSP: 0018:ffff9d4d86347e90 EFLAGS: 00000246 ORIG_RAX: 
ffffffffffffff11
[  573.279852] RAX: 0000000000000000 RBX: 000000000000000e RCX: 0000000000000000
[  573.279854] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  573.279857] RBP: ffff9d4d86347ea8 R08: 0000008566d3c328 R09: ffff8f282a5a4e00
[  573.279859] R10: 0000000000000000 R11: 00000152a21daba2 R12: 000000000000000e
[  573.279861] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[  573.279870]  arch_cpu_idle+0x15/0x20
[  573.279874]  default_idle_call+0x23/0x30
[  573.279878]  do_idle+0x172/0x1f0
[  573.279882]  cpu_startup_entry+0x73/0x80
[  573.279885]  start_secondary+0x1ab/0x200
[  573.279890]  secondary_startup_64+0xa5/0xb0
[  573.279892] Code: 36 00 49 63 4e e8 eb 92 4c 89 ef c6 05 08 1d d7 00 01 e8 
f3 1d fd ff 89 d9 48 89 c2 4c 89 ee 48 c7 c7 60 ed 39 85 e8 3f e2 7e ff <0f> 0b 
eb c0 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55
[  573.279922] ---[ end trace bc176e8d4716bac2 ]---
[  573.279942] i40e 0000:08:00.1 ens1f1: tx_timeout: VSI_seid: 391, Q 1, NTC: 
0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
[  573.279955] i40e 0000:08:00.1 ens1f1: tx_timeout recovery level 1, 
hung_queue 1
[  573.282420] i40e 0000:08:00.1: VSI seid 391 Tx ring 0 disable timeout
[  573.338312] i40e 0000:08:00.1: VSI seid 393 Tx ring 64 disable timeout
[  579.167611] i40e 0000:08:00.1 ens1f1: tx_timeout: VSI_seid: 391, Q 10, NTC: 
0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
[  579.167650] i40e 0000:08:00.1 ens1f1: tx_timeout recovery level 2, 
hung_queue 10
[  579.168257] i40e 0000:08:00.1: VSI seid 391 Tx ring 0 disable timeout
[  579.169227] i40evf 0000:08:02.1: PF reset warning received
[  579.169231] i40evf 0000:08:02.1: Scheduling reset task
[  579.224464] i40e 0000:08:00.1: VSI seid 393 Tx ring 64 disable timeout
[  579.279847] i40e 0000:08:00.0: VSI seid 390 Tx ring 0 disable timeout
[  579.335352] i40e 0000:08:00.0: VSI seid 392 Tx ring 64 disable timeout
[  582.377042] i40e 0000:08:00.0: DCBX offload is not supported or is disabled 
for this PF.

My first test run was to validate the patches on a Bionic host, I'll
move on to testing Xenial next. I've attached the kprobe module source,
if anyone wants to try breaking i40e as well. The offsets in the current
version have been calculated for kernel 4.15.0-136 and are the same for
test kernel 4.15.0-137.


** Attachment added: "probe_tx_desc.c"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1772675/+attachment/5475181/+files/probe_tx_desc.c

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1772675

Title:
  i40e PF reset due to incorrect MDD event

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  Won't Fix

Bug description:
  [Impact]
  The i40e driver sometimes causes a "malicious device" event that the firmware 
detects, which causes the firmware to reset the NIC, causing an interruption in 
the network connection - which can cause further problems, e.g. if the 
interface is in a bond; the reset will at least cause a temporary interruption 
in network traffic.

  [Fix]
  In the case of MDD events issued for the PF, they are usually the result of a 
misconfigured TX descriptor and not due to "bad" actions in the VFs. We don't 
need to issue a reset to the whole NIC, TX hang checks should handle those if 
necessary.

  [Test Case]
  The bug is unfortunately difficult to reproduce, as there's no detailed 
documentation on how the i40e firmware detects and raises MDDs. We have seen 
reports of this happening in Xenial and Bionic, for workloads stressing i40e 
bonds in LACP mode.
  Reproducing is easily detected, as the network traffic will be interrupted 
and the system logs will contain a message like:
  i40e 0000:02:00.1: TX driver issue detected, PF reset issued

  [Regression Potential]
  Since we're removing resets for the NIC, regressions could show up as issues 
in connectivity after the MDD events are raised. If the firmware expects the 
whole NIC to reset, we could see TX/RX hangs and general unresponsiveness in 
networking. The potential for this should however be fairly low, as this patch 
has been present since kernel 5.2 and hasn't seen any fixes or regressions 
upstream. Basic smoke tests also showed that the driver continues working as 
expected.

  ==
  [original description]

  This is a continuation from bug 1713553 and then bug 1723127; a patch
  was added in the first bug and then the second bug, to attempt to fix
  this, and it may have helped reduce the issue but appears not to have
  fixed it, based on more reports.

  See bug 1713553 and bug 1723127 for more details.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1772675/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to