Note there is one additional upstream commit that improves performance by allowing up to 12k per tx descriptor, instead of 8k per descriptor (the current code in Xenial 4.4 kernel), and its changes are related to the fixes for this issue. However, from my reading of the code, I don't think that commit is actually required to fix this problem, so I am not including it in this bug (yet).
commit 5c4654daf2e2f25dfbd7fa572c59937ea6d4198b Author: Alexander Duyck <[email protected]> Date: Fri Feb 19 12:17:08 2016 -0800 i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1713553 Title: Intel i40e PF reset due to incorrect MDD detection Status in linux package in Ubuntu: In Progress Bug description: [Impact] Using an Intel i40e network device, under heavy traffic load with TSO enabled, the device will spontaneously reset itself and issue errors similar to the following: Jun 14 14:09:51 hostname kernel: [4253913.851053] i40e 0000:05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:53 hostname kernel: [4253915.476283] i40e 0000:05:00.1: TX driver issue detected, PF reset issued Jun 14 14:09:54 hostname kernel: [4253917.411264] i40e 0000:05:00.1: TX driver issue detected, PF reset issued This causes a full reset of the PF, which causes an interruption in traffic flow. This was partially fixed by Xenial commit 12f8cc59d5886b86372f45290166deca57a60d7a, however there is one additional upstream commit required to fully fix the issue: commit 841493a3f64395b60554afbcaa17f4350f90e764 Author: Alexander Duyck <[email protected]> Date: Tue Sep 6 18:05:04 2016 -0700 i40e: Limit TX descriptor count in cases where frag size is greater than 16K This fix was never backported into the Xenial 4.4 kernel series, but is already present in the Xenial HWE (and Zesty) 4.10 kernel. [Testcase] In this case, the issue occurs at a customer site using i40e based Intel network cards with SR-IOV enabled. Under heavy load, the card will reset itself as described. [Regression Potential] As with any change to a network card driver, this may cause regressions with network I/O through i40e card(s). However, this specific change only increases the likelyhood that any specific large TSO tx will need to be linearized, which will avoid the PF reset. Linearizing a TSO tx that did not need to be linearized will not cause any failures, it may only decrease performance slightly. However this patch should only cause linearization when required to avoid the MDD detection and PF reset. [Other Info] The previous bug for this issue is bug 1700834. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1713553/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : [email protected] Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp

