Hey everyone, This is a bit of a long shot but we are trying to find a cause for a weird issue with i40e.
>From time to time, when we start a VM with SRIOV ports, libvirt ends up with a hung task and we need to reboot the entire compute to be able to recover. We see the following in the compute logs. [Tue Jun 22 18:34:04 2021] INFO: task libvirtd:21535 blocked for more than 600 seconds. [Tue Jun 22 18:34:04 2021] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [Tue Jun 22 18:34:04 2021] libvirtd D ffff9a6b1fb0d140 0 21535 20450 0x00000080 [Tue Jun 22 18:34:04 2021] Call Trace: [Tue Jun 22 18:34:04 2021] [<ffffffffa3168dc9>] schedule+0x29/0x70 [Tue Jun 22 18:34:04 2021] [<ffffffffa31668d1>] schedule_timeout+0x221/0x2d0 [Tue Jun 22 18:34:04 2021] [<ffffffffa2a61d13>] ? x2apic_send_IPI_mask+0x13/0x20 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ad68a0>] ? try_to_wake_up+0x190/0x390 [Tue Jun 22 18:34:04 2021] [<ffffffffa316917d>] wait_for_completion+0xfd/0x140 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ad6b60>] ? wake_up_state+0x20/0x20 [Tue Jun 22 18:34:04 2021] [<ffffffffa2aba63d>] flush_work+0xfd/0x190 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ab7430>] ? move_linked_works+0x90/0x90 [Tue Jun 22 18:34:04 2021] [<ffffffffa2aba759>] __cancel_work_timer+0x89/0x120 [Tue Jun 22 18:34:04 2021] [<ffffffffa2aba800>] cancel_work_sync+0x10/0x20 [Tue Jun 22 18:34:04 2021] [<ffffffffc06a810a>] i40evf_remove+0x5a/0x360 [i40evf] [Tue Jun 22 18:34:04 2021] [<ffffffffa2dc699e>] pci_device_remove+0x3e/0xc0 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ea8a42>] __device_release_driver+0x82/0xf0 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ea8ad3>] device_release_driver+0x23/0x30 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ea735d>] driver_unbind+0xbd/0xe0 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ea6887>] drv_attr_store+0x27/0x40 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ccbf72>] sysfs_kf_write+0x42/0x50 [Tue Jun 22 18:34:04 2021] [<ffffffffa2ccb54b>] kernfs_fop_write+0xeb/0x160 [Tue Jun 22 18:34:04 2021] [<ffffffffa2c41890>] vfs_write+0xc0/0x1f0 [Tue Jun 22 18:34:04 2021] [<ffffffffa2c426af>] SyS_write+0x7f/0xf0 [Tue Jun 22 18:34:04 2021] [<ffffffffa317606b>] tracesys+0xa3/0xc9 We have the following i40e driver driver: i40e version: 2.3.2-k firmware-version: 5.60 0x8000355f 1.1752.0 expansion-rom-version: bus-info: 0000:37:00.1 supports-statistics: yes supports-test: yes supports-eeprom-access: yes supports-register-dump: yes supports-priv-flags: yes sudo modinfo i40evf filename: /lib/modules/3.10.0-957.23.1.el7.x86_64/kernel/drivers/net/ethernet/intel/i40evf/i40evf.ko.xz version: 3.2.2-k license: GPL description: Intel(R) XL710 X710 Virtual Function Network Driver author: Intel Corporation, <linux.n...@intel.com> retpoline: Y rhelversion: 7.6 srcversion: 01219194E21295C2545FD09 alias: pci:v00008086d00001889sv*sd*bc*sc*i* alias: pci:v00008086d000037CDsv*sd*bc*sc*i* alias: pci:v00008086d00001571sv*sd*bc*sc*i* alias: pci:v00008086d0000154Csv*sd*bc*sc*i* depends: intree: Y vermagic: 3.10.0-957.23.1.el7.x86_64 SMP mod_unload modversions signer: Red Hat Enterprise Linux kernel signing key sig_key: A2:2A:4D:9E:11:3C:46:B5:55:31:AC:5C:BE:ED:25:EB:A4:83:DA:DE sig_hashalgo: sha256 sudo modinfo i40e filename: /lib/modules/3.10.0-957.23.1.el7.x86_64/kernel/drivers/net/ethernet/intel/i40e/i40e.ko.xz version: 2.3.2-k license: GPL description: Intel(R) Ethernet Connection XL710 Network Driver author: Intel Corporation, <e1000-devel@lists.sourceforge.net> retpoline: Y rhelversion: 7.6 srcversion: B97DEF9127338A70AF51428 alias: pci:v00008086d0000158Bsv*sd*bc*sc*i* alias: pci:v00008086d0000158Asv*sd*bc*sc*i* alias: pci:v00008086d00001588sv*sd*bc*sc*i* alias: pci:v00008086d00001587sv*sd*bc*sc*i* alias: pci:v00008086d000037D3sv*sd*bc*sc*i* alias: pci:v00008086d000037D2sv*sd*bc*sc*i* alias: pci:v00008086d000037D1sv*sd*bc*sc*i* alias: pci:v00008086d000037D0sv*sd*bc*sc*i* alias: pci:v00008086d000037CFsv*sd*bc*sc*i* alias: pci:v00008086d000037CEsv*sd*bc*sc*i* alias: pci:v00008086d00001589sv*sd*bc*sc*i* alias: pci:v00008086d00001586sv*sd*bc*sc*i* alias: pci:v00008086d00001585sv*sd*bc*sc*i* alias: pci:v00008086d00001584sv*sd*bc*sc*i* alias: pci:v00008086d00001583sv*sd*bc*sc*i* alias: pci:v00008086d00001581sv*sd*bc*sc*i* alias: pci:v00008086d00001580sv*sd*bc*sc*i* alias: pci:v00008086d00001574sv*sd*bc*sc*i* alias: pci:v00008086d00001572sv*sd*bc*sc*i* depends: ptp intree: Y vermagic: 3.10.0-957.23.1.el7.x86_64 SMP mod_unload modversions signer: Red Hat Enterprise Linux kernel signing key sig_key: A2:2A:4D:9E:11:3C:46:B5:55:31:AC:5C:BE:ED:25:EB:A4:83:DA:DE sig_hashalgo: sha256 parm: debug:Debug level (0=none,...,16=all), Debug mask (0x8XXXXXXX) (uint) One part that we are not sure is if there could be any conflict between the compute if the i40e and the i40evf driver are both loaded. We have been running these specific versions across thousands of computes with different firmwares and the issue was previously seen 5% of the time. With more recent deployment, it's closer to 50% of the time, which seems to indicate that something changed. Thank you! _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel Ethernet, visit https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet