After more investigations, I've found the root cause that triggers the
issue.
#
# The TL,DR:
#
With many VFs to initialise, systemd-networkd-wait-online.service's timeout of
2 minutes is reached.
Meaning services like Nova, Neutron and OVN/OVS will start while the kernel is
still creating and activating VFs.
systemd-networkd-wait-online.service needs to be overrided to increase the
default timeout to something like 5 or 10 minutes.
#
# The lenghty explanation:
#
When looking at the global logs of the servers, I noticed all failing boots on
servers had one thing in common.
Systemd-networkd-wait-online.service was timing out after spending more than 2
minutes.
On success boots, this error would not be seen. And this is interesting because
systemd waits for network this service to succeed before starting other
services after the network target.
When allocating 24 VFs on 2 interfaces, it still worked most of the time. But
after inspection, it appears it was getting pretty close to the 2 minutes limit.
In the attached file systemd-networkd-wait-24VFs.txt we can see the network
initialisation is fairly inconsistent and vary a lot in term of timing.
After testing again at 32 VFs, I could see frequently the network taking longer
than 2 minutes to be set up and all remaining services on the host were being
started while the kernel was still working on VFs.
I decided to override systemd-networkd-wait-online service to extend the
default timeout to 4 minutes.
Here is the test on the same server with 32 VFs :
```
# before override in systemd unit
2min 127ms systemd-networkd-wait-online.service --> timed out
--> has the kernel failing
# reboot after override
2min 33.617s systemd-networkd-wait-online.service
--> works fine
1min 55.140s systemd-networkd-wait-online.service
--> works fine
```
Although the kernel is an issue as well, I think it seriously needs to
be considered, from ovn-chassis' charm point of view, to extend the
default timeout on systemd-networkd-wait-online.service, especially
since probably other softwares depending on network connectivity can
probably into error or other unknown bugs.
And this is failing with only the initialisation of 64 VFs in total. In
scenarios where current network cards can handle 1000 VFs on a single
port or having many network cards, the initialisation of VFs can take a
while. May be having a configuration option to choose the timeout value
for systemd-networkd-wait-online.service could be useful.
** Changed in: charm-ovn-chassis
Status: Invalid => New
** Attachment added: "systemd-networkd-wait-24VFs.txt"
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2009594/+attachment/5653427/+files/systemd-networkd-wait-24VFs.txt
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2009594
Title:
Mlx5 kworker blocked Kernel 5.19 (Jammy HWE)
Status in charm-ovn-chassis:
New
Status in linux package in Ubuntu:
Confirmed
Bug description:
This is seen on particular with :
* Charmed Openstack with Jammy Yoga
* 5.19.0-35-generic (linux-generic-hwe-22.04/jammy-updates)
* Mellanox Connectx-6 card with mlx5_core module being used
* SR-IOV is being used with VF-LAG for the use of OVN hardware offloading
The servers enter into very high load (around 75~100) quickly during the boot
with all process relying on network communication with the Mellanox network
card being stuck or extremely slow.
Kernel logs are being displayed about kworkers being blocked for more than
120 seconds
The number of SR-IOV devices configured both from the firmware and the kernel
seems to have a serious correlation with the likeliness of this bug to occur.
Having enabled more VF seems to hugely increase the risk for this bug to
arise.
This does not happen systematically at every boot, but with 32 VFs on each
PF, it occurs about 40% of the time.
To recover the server, a cold reboot is required.
Look at a quick sample of the trace, this seems to involve directly
the mlx5 driver within the kernel :
Mar 07 05:24:56 nova-1 kernel: INFO: task kworker/0:1:19 blocked for more
than 120 seconds.
Mar 07 05:24:56 nova-1 kernel: Tainted: P OE
5.19.0-35-generic #36~22.04.1-Ubuntu
Mar 07 05:24:56 nova-1 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 07 05:24:56 nova-1 kernel: task:kworker/0:1 state:D stack: 0 pid:
19 ppid: 2 flags:0x00004000
Mar 07 05:24:56 nova-1 kernel: Workqueue: events work_for_cpu_fn
Mar 07 05:24:56 nova-1 kernel: Call Trace:
Mar 07 05:24:56 nova-1 kernel: <TASK>
Mar 07 05:24:56 nova-1 kernel: __schedule+0x257/0x5d0
Mar 07 05:24:56 nova-1 kernel: schedule+0x68/0x110
Mar 07 05:24:56 nova-1 kernel: schedule_preempt_disabled+0x15/0x30
Mar 07 05:24:56 nova-1 kernel: __mutex_lock.constprop.0+0x4f1/0x750
Mar 07 05:24:56 nova-1 kernel: __mutex_lock_slowpath+0x13/0x20
Mar 07 05:24:56 nova-1 kernel: mutex_lock+0x3e/0x50
Mar 07 05:24:56 nova-1 kernel: mlx5_register_device+0x1c/0xb0 [mlx5_core]
Mar 07 05:24:56 nova-1 kernel: mlx5_init_one+0xe4/0x110 [mlx5_core]
Mar 07 05:24:56 nova-1 kernel: probe_one+0xcb/0x120 [mlx5_core]
Mar 07 05:24:56 nova-1 kernel: local_pci_probe+0x4b/0x90
Mar 07 05:24:56 nova-1 kernel: work_for_cpu_fn+0x1a/0x30
Mar 07 05:24:56 nova-1 kernel: process_one_work+0x21f/0x400
Mar 07 05:24:56 nova-1 kernel: worker_thread+0x200/0x3f0
Mar 07 05:24:56 nova-1 kernel: ? rescuer_thread+0x3a0/0x3a0
Mar 07 05:24:56 nova-1 kernel: kthread+0xee/0x120
Mar 07 05:24:56 nova-1 kernel: ? kthread_complete_and_exit+0x20/0x20
Mar 07 05:24:56 nova-1 kernel: ret_from_fork+0x22/0x30
Mar 07 05:24:56 nova-1 kernel: </TASK>
To manage notifications about this bug go to:
https://bugs.launchpad.net/charm-ovn-chassis/+bug/2009594/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp