Public bug reported:
SRU Justification
[Impact]
A Linux guest on Hyper-V/Azure can occasionally crash during early Linux kernel
boot due to a strange host behavior:
1. The host assigns a VF to the guest;
2. The host immediately unassigns the VF from the guest; //Dexuan: due to some
race conditions bug in Linux vPCI driver, Linux can crash.
3. The host assigns the VF to the guest again.
I'm asking the Hyper-V team to investigate the host behavior, but I'm not sure
when they'll get that fixed.
Starting late 2022 (around Nov 2022), Linux guests on Azure started to
crash more frequently due to a host side update at that time: a new
host/hypervisor feature of handling "correctable memory errors" can
cause a lot of successive VF remove/add events, so the race conditions
bug in Linux vPCI driver can surface much more easily. The Hyper-V team
is implementing a batching mechanism so that the guest will get much
less VF remove/add events (ETA: June 2023), but meanwhile we should also
get the Linux race condition bugs fixed so that Linux guests won't crash
even if it receives the successive VF remove/add events.
[Test Plan]
MSFT tested
[Regression potential]
Guests may continue to crash.
[Other Info]
SF: #00349076
** Affects: linux-azure (Ubuntu)
Importance: Undecided
Status: New
** Affects: linux-azure (Ubuntu Jammy)
Importance: Medium
Assignee: Tim Gardner (timg-tpi)
Status: In Progress
** Affects: linux-azure (Ubuntu Lunar)
Importance: Medium
Assignee: Tim Gardner (timg-tpi)
Status: In Progress
** Package changed: linux (Ubuntu) => linux-azure (Ubuntu)
** Also affects: linux-azure (Ubuntu Jammy)
Importance: Undecided
Status: New
** Also affects: linux-azure (Ubuntu Lunar)
Importance: Undecided
Status: New
** Changed in: linux-azure (Ubuntu Jammy)
Importance: Undecided => Medium
** Changed in: linux-azure (Ubuntu Jammy)
Status: New => In Progress
** Changed in: linux-azure (Ubuntu Jammy)
Assignee: (unassigned) => Tim Gardner (timg-tpi)
** Changed in: linux-azure (Ubuntu Lunar)
Importance: Undecided => Medium
** Changed in: linux-azure (Ubuntu Lunar)
Status: New => In Progress
** Changed in: linux-azure (Ubuntu Lunar)
Assignee: (unassigned) => Tim Gardner (timg-tpi)
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/2023071
Title:
[Azure] Fix VM crash/hang issues due to fast VF add/remove events
Status in linux-azure package in Ubuntu:
New
Status in linux-azure source package in Jammy:
In Progress
Status in linux-azure source package in Lunar:
In Progress
Bug description:
SRU Justification
[Impact]
A Linux guest on Hyper-V/Azure can occasionally crash during early Linux
kernel boot due to a strange host behavior:
1. The host assigns a VF to the guest;
2. The host immediately unassigns the VF from the guest; //Dexuan: due to
some race conditions bug in Linux vPCI driver, Linux can crash.
3. The host assigns the VF to the guest again.
I'm asking the Hyper-V team to investigate the host behavior, but I'm not
sure when they'll get that fixed.
Starting late 2022 (around Nov 2022), Linux guests on Azure started to
crash more frequently due to a host side update at that time: a new
host/hypervisor feature of handling "correctable memory errors" can
cause a lot of successive VF remove/add events, so the race conditions
bug in Linux vPCI driver can surface much more easily. The Hyper-V
team is implementing a batching mechanism so that the guest will get
much less VF remove/add events (ETA: June 2023), but meanwhile we
should also get the Linux race condition bugs fixed so that Linux
guests won't crash even if it receives the successive VF remove/add
events.
[Test Plan]
MSFT tested
[Regression potential]
Guests may continue to crash.
[Other Info]
SF: #00349076
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2023071/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp