Public bug reported:

SRU Justification

[Impact]

A Linux guest on Hyper-V/Azure can occasionally crash during early Linux kernel 
boot due to a strange host behavior:
1. The host assigns a VF to the guest;
2. The host immediately unassigns the VF from the guest; //Dexuan: due to some 
race conditions bug in Linux vPCI driver, Linux can crash.
3. The host assigns the VF to the guest again.

Starting late 2022 (around Nov 2022), Linux guests on Azure started to
crash more frequently due to a host side update at that time: a new
host/hypervisor feature of handling "correctable memory errors" can
cause a lot of successive VF remove/add events, so the race conditions
bug in Linux vPCI driver can surface much more easily. The Hyper-V team
is implementing a batching mechanism so that the guest will get much
less VF remove/add events (ETA: June 2023), but meanwhile we should also
get the Linux race condition bugs fixed so that Linux guests won't crash
even if it receives the successive VF remove/add events.

[Test Plan]

Microsoft tested

[Regression potential]

PCI devices may not get registered, or VMs may crash.

[Other Info]

SF: #00349076

** Affects: linux-azure (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: linux-azure (Ubuntu Focal)
     Importance: Medium
     Assignee: Tim Gardner (timg-tpi)
         Status: In Progress

** Affects: linux-azure (Ubuntu Jammy)
     Importance: Medium
     Assignee: Tim Gardner (timg-tpi)
         Status: In Progress

** Affects: linux-azure (Ubuntu Lunar)
     Importance: Medium
     Assignee: Tim Gardner (timg-tpi)
         Status: In Progress

** Also affects: linux (Ubuntu Lunar)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Focal)
       Status: New => In Progress

** Changed in: linux (Ubuntu Focal)
     Assignee: (unassigned) => Tim Gardner (timg-tpi)

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Jammy)
       Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
     Assignee: (unassigned) => Tim Gardner (timg-tpi)

** Changed in: linux (Ubuntu Lunar)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Lunar)
       Status: New => In Progress

** Changed in: linux (Ubuntu Lunar)
     Assignee: (unassigned) => Tim Gardner (timg-tpi)

** Package changed: linux (Ubuntu) => linux-azure (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/2023594

Title:
   Case [Azure] Fix VM crash/hang issues due to fast VF add/remove
  events

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Focal:
  In Progress
Status in linux-azure source package in Jammy:
  In Progress
Status in linux-azure source package in Lunar:
  In Progress

Bug description:
  SRU Justification

  [Impact]

  A Linux guest on Hyper-V/Azure can occasionally crash during early Linux 
kernel boot due to a strange host behavior:
  1. The host assigns a VF to the guest;
  2. The host immediately unassigns the VF from the guest; //Dexuan: due to 
some race conditions bug in Linux vPCI driver, Linux can crash.
  3. The host assigns the VF to the guest again.

  Starting late 2022 (around Nov 2022), Linux guests on Azure started to
  crash more frequently due to a host side update at that time: a new
  host/hypervisor feature of handling "correctable memory errors" can
  cause a lot of successive VF remove/add events, so the race conditions
  bug in Linux vPCI driver can surface much more easily. The Hyper-V
  team is implementing a batching mechanism so that the guest will get
  much less VF remove/add events (ETA: June 2023), but meanwhile we
  should also get the Linux race condition bugs fixed so that Linux
  guests won't crash even if it receives the successive VF remove/add
  events.

  [Test Plan]

  Microsoft tested

  [Regression potential]

  PCI devices may not get registered, or VMs may crash.

  [Other Info]

  SF: #00349076

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/2023594/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to