** Also affects: linux-aws (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Changed in: linux-aws (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux-aws (Ubuntu Bionic)
       Status: New => In Progress

** Package changed: linux-aws (Ubuntu) => linux (Ubuntu)

** Changed in: linux (Ubuntu)
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-aws in Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress

Bug description:
  [ Impact ]
  The bionic 4.15 kernels are failing to boot on r5.metal instances on AWS. The 
default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to 
bionic/linux-aws(4.15.0-1113-aws) or bionic/linux (4.15.0-160.168) the machine 
fails to boot the 4.15 kernel.

  This problem only appears on metal instances, which uses NVME instead
  of XVDA devices.

  [ Fix ]
  It was discovered that after reverting the following two commits from 
upstream stable the 4.15 kernels can be booted again on the affected AWS metal 
instance:

  PCI/MSI: Enforce that MSI-X table entry is masked for update
  PCI/MSI: Enforce MSI[X] entry updates to be visible

  [ Test Case ]
  Deploy a r5.metal instance on AWS with a bionic image, which should boot 
initially with bionic/linux-aws-5.4. Install bionic/linux or bionic/linux-aws 
(4.15 based) and reboot the system.

  [ Where problems could occur ]
  These two commits are part of a larger patchset fixing PCI/MSI issues which 
were backported to some upstream stable releases. By reverting only part of the 
set we might end up with MSI issues that were not present before the whole set 
was applied. Regression potential can be minimized by testing the kernels with 
these two reverted patches on all the platforms available.

  [ Original Description ]
  When creating an r5.metal instance on AWS, the default kernel is 
bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to 
bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel.

  If I remove these patches the instance correctly boots the 4.15 kernel

  https://lists.ubuntu.com/archives/kernel-
  team/2021-September/123963.html

  With that being said, after successfully updating to the 4.15 without
  those patches applied, I can then upgrade to a 4.15 kernel with the
  above patches included, and the instance will boot properly.

  This problem only appears on metal instances, which uses NVME instead
  of XVDA devices.

  AWS instances also use the 'discard' mount option with ext4, thought
  maybe there could be a race condition between ext4 discard and journal
  flush.  Removed 'discard' from mount options and rebooted 5.4 kernel
  prior to 4.15 kernel installation, but still wouldn't boot after
  installing the 4.15 kernel.

  I have been unable to capture a stack trace using 'aws get-console-
  output'. After enabling kdump I was unable to replicate the failure.
  So there must be some sort of race with either ext4 and/or nvme.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1946149/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to