Hey everyone,

This is a bit of a long shot but we are trying to find a cause for a weird
issue with i40e.

>From time to time, when we start a VM with SRIOV ports, libvirt ends up
with a hung task and we need to reboot the entire compute to be able to
recover.

We see the following in the compute logs.

[Tue Jun 22 18:34:04 2021] INFO: task libvirtd:21535 blocked for more than
600 seconds.
[Tue Jun 22 18:34:04 2021] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Tue Jun 22 18:34:04 2021] libvirtd        D ffff9a6b1fb0d140     0 21535
 20450 0x00000080
[Tue Jun 22 18:34:04 2021] Call Trace:
[Tue Jun 22 18:34:04 2021]  [<ffffffffa3168dc9>] schedule+0x29/0x70
[Tue Jun 22 18:34:04 2021]  [<ffffffffa31668d1>]
schedule_timeout+0x221/0x2d0
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2a61d13>] ?
x2apic_send_IPI_mask+0x13/0x20
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ad68a0>] ?
try_to_wake_up+0x190/0x390
[Tue Jun 22 18:34:04 2021]  [<ffffffffa316917d>]
wait_for_completion+0xfd/0x140
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ad6b60>] ? wake_up_state+0x20/0x20
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2aba63d>] flush_work+0xfd/0x190
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ab7430>] ?
move_linked_works+0x90/0x90
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2aba759>]
__cancel_work_timer+0x89/0x120
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2aba800>] cancel_work_sync+0x10/0x20
[Tue Jun 22 18:34:04 2021]  [<ffffffffc06a810a>] i40evf_remove+0x5a/0x360
[i40evf]
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2dc699e>] pci_device_remove+0x3e/0xc0
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ea8a42>]
__device_release_driver+0x82/0xf0
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ea8ad3>]
device_release_driver+0x23/0x30
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ea735d>] driver_unbind+0xbd/0xe0
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ea6887>] drv_attr_store+0x27/0x40
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ccbf72>] sysfs_kf_write+0x42/0x50
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2ccb54b>] kernfs_fop_write+0xeb/0x160
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2c41890>] vfs_write+0xc0/0x1f0
[Tue Jun 22 18:34:04 2021]  [<ffffffffa2c426af>] SyS_write+0x7f/0xf0
[Tue Jun 22 18:34:04 2021]  [<ffffffffa317606b>] tracesys+0xa3/0xc9

We have the following i40e driver
driver: i40e
version: 2.3.2-k
firmware-version: 5.60 0x8000355f 1.1752.0
expansion-rom-version:
bus-info: 0000:37:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes

sudo modinfo i40evf
filename:
/lib/modules/3.10.0-957.23.1.el7.x86_64/kernel/drivers/net/ethernet/intel/i40evf/i40evf.ko.xz
version:        3.2.2-k
license:        GPL
description:    Intel(R) XL710 X710 Virtual Function Network Driver
author:         Intel Corporation, <linux.n...@intel.com>
retpoline:      Y
rhelversion:    7.6
srcversion:     01219194E21295C2545FD09
alias:          pci:v00008086d00001889sv*sd*bc*sc*i*
alias:          pci:v00008086d000037CDsv*sd*bc*sc*i*
alias:          pci:v00008086d00001571sv*sd*bc*sc*i*
alias:          pci:v00008086d0000154Csv*sd*bc*sc*i*
depends:
intree:         Y
vermagic:       3.10.0-957.23.1.el7.x86_64 SMP mod_unload modversions
signer:         Red Hat Enterprise Linux kernel signing key
sig_key:        A2:2A:4D:9E:11:3C:46:B5:55:31:AC:5C:BE:ED:25:EB:A4:83:DA:DE
sig_hashalgo:   sha256

sudo modinfo i40e
filename:
/lib/modules/3.10.0-957.23.1.el7.x86_64/kernel/drivers/net/ethernet/intel/i40e/i40e.ko.xz
version:        2.3.2-k
license:        GPL
description:    Intel(R) Ethernet Connection XL710 Network Driver
author:         Intel Corporation, <e1000-devel@lists.sourceforge.net>
retpoline:      Y
rhelversion:    7.6
srcversion:     B97DEF9127338A70AF51428
alias:          pci:v00008086d0000158Bsv*sd*bc*sc*i*
alias:          pci:v00008086d0000158Asv*sd*bc*sc*i*
alias:          pci:v00008086d00001588sv*sd*bc*sc*i*
alias:          pci:v00008086d00001587sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D3sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D2sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D1sv*sd*bc*sc*i*
alias:          pci:v00008086d000037D0sv*sd*bc*sc*i*
alias:          pci:v00008086d000037CFsv*sd*bc*sc*i*
alias:          pci:v00008086d000037CEsv*sd*bc*sc*i*
alias:          pci:v00008086d00001589sv*sd*bc*sc*i*
alias:          pci:v00008086d00001586sv*sd*bc*sc*i*
alias:          pci:v00008086d00001585sv*sd*bc*sc*i*
alias:          pci:v00008086d00001584sv*sd*bc*sc*i*
alias:          pci:v00008086d00001583sv*sd*bc*sc*i*
alias:          pci:v00008086d00001581sv*sd*bc*sc*i*
alias:          pci:v00008086d00001580sv*sd*bc*sc*i*
alias:          pci:v00008086d00001574sv*sd*bc*sc*i*
alias:          pci:v00008086d00001572sv*sd*bc*sc*i*
depends:        ptp
intree:         Y
vermagic:       3.10.0-957.23.1.el7.x86_64 SMP mod_unload modversions
signer:         Red Hat Enterprise Linux kernel signing key
sig_key:        A2:2A:4D:9E:11:3C:46:B5:55:31:AC:5C:BE:ED:25:EB:A4:83:DA:DE
sig_hashalgo:   sha256
parm:           debug:Debug level (0=none,...,16=all), Debug mask
(0x8XXXXXXX) (uint)

One part that we are not sure is if there could be any conflict between the
compute if the i40e and the i40evf driver are both loaded.

We have been running these specific versions across thousands of computes
with different firmwares and the issue was previously seen 5% of the time.

With more recent deployment, it's closer to 50% of the time, which seems to
indicate that something changed.

Thank you!

_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel Ethernet, visit 
https://forums.intel.com/s/topic/0TO0P00000018NbWAI/intel-ethernet

Reply via email to