[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-11-02 Thread Krzysztof Kozlowski
** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-26 Thread Ubuntu Kernel Bot
This bug is awaiting verification that the linux-kvm/4.15.0-1102.104
kernel in -proposed solves the problem. Please test the kernel and
update this bug with the results. If the problem is solved, change the
tag 'verification-needed-bionic' to 'verification-done-bionic'. If the
problem still exists, change the tag 'verification-needed-bionic' to
'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will
be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how
to enable and use -proposed. Thank you!


** Tags added: verification-needed-bionic

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-19 Thread Launchpad Bug Tracker
This bug was fixed in the package linux - 4.15.0-161.169

---
linux (4.15.0-161.169) bionic; urgency=medium

  * bionic/linux: 4.15.0-161.169 -proposed tracker (LP: #1947358)

  * Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
r5.metal (LP: #1946149)
- SAUCE: Revert "PCI/MSI: Enforce MSI[X] entry updates to be visible"
- SAUCE: Revert "PCI/MSI: Enforce that MSI-X table entry is masked for 
update"

linux (4.15.0-160.168) bionic; urgency=medium

  * bionic/linux: 4.15.0-160.168 -proposed tracker (LP: #1944933)

  * Packaging resync (LP: #1786013)
- debian/dkms-versions -- update from kernel-versions (main/2021.09.27)

  * ext4 journal recovery fails w/ data=journal + mmap (LP: #1847340)
- jbd2: introduce/export functions
  jbd2_journal_submit|finish_inode_data_buffers()
- jbd2, ext4, ocfs2: introduce/use journal callbacks
  j_submit|finish_inode_data_buffers()
- ext4: data=journal: fixes for ext4_page_mkwrite()
- ext4: data=journal: write-protect pages on j_submit_inode_data_buffers()
- ext4: fix mmap write protection for data=journal mode

  * CVE-2021-40490
- ext4: fix race writing to an inline_data file while its xattrs are 
changing

  * Bionic update: upstream stable patchset 2021-09-22 (LP: #1944600)
- iio: humidity: hdc100x: Add margin to the conversion time
- iio: adc: Fix incorrect exit of for-loop
- ASoC: intel: atom: Fix reference to PCM buffer address
- i2c: dev: zero out array used for i2c reads from userspace
- ACPI: NFIT: Fix support for virtual SPA ranges
- ASoC: cs42l42: Correct definition of ADC Volume control
- ASoC: cs42l42: Don't allow SND_SOC_DAIFMT_LEFT_J
- ASoC: cs42l42: Fix inversion of ADC Notch Switch control
- ASoC: cs42l42: Remove duplicate control for WNF filter frequency
- net: dsa: mt7530: add the missing RxUnicast MIB counter
- ppp: Fix generating ifname when empty IFLA_IFNAME is specified
- psample: Add a fwd declaration for skbuff
- net: Fix memory leak in ieee802154_raw_deliver
- net: bridge: fix memleak in br_add_if()
- tcp_bbr: fix u32 wrap bug in round logic if bbr_init() called after 2B
  packets
- xen/events: Fix race in set_evtchn_to_irq
- vsock/virtio: avoid potential deadlock when vsock device remove
- powerpc/kprobes: Fix kprobe Oops happens in booke
- x86/tools: Fix objdump version check again
- x86/resctrl: Fix default monitoring groups reporting
- PCI/MSI: Enable and mask MSI-X early
- PCI/MSI: Do not set invalid bits in MSI mask
- PCI/MSI: Correct misleading comments
- PCI/MSI: Use msi_mask_irq() in pci_msi_shutdown()
- PCI/MSI: Protect msi_desc::masked for multi-MSI
- PCI/MSI: Mask all unused MSI-X entries
- PCI/MSI: Enforce that MSI-X table entry is masked for update
- PCI/MSI: Enforce MSI[X] entry updates to be visible
- vmlinux.lds.h: Handle clang's module.{c,d}tor sections
- mac80211: drop data frames without key on encrypted links
- x86/fpu: Make init_fpstate correct with optimized XSAVE
- ath: Use safer key clearing with key cache entries
- ath9k: Clear key cache explicitly on disabling hardware
- ath: Export ath_hw_keysetmac()
- ath: Modify ath_key_delete() to not need full key entry
- ath9k: Postpone key cache entry deletion for TXQ frames reference it
- dmaengine: usb-dmac: Fix PM reference leak in usb_dmac_probe()
- ARM: dts: am43x-epos-evm: Reduce i2c0 bus speed for tps65218
- dmaengine: of-dma: router_xlate to return -EPROBE_DEFER if controller is 
not
  yet available
- scsi: megaraid_mm: Fix end of loop tests for list_for_each_entry()
- scsi: scsi_dh_rdac: Avoid crash during rdac_bus_attach()
- scsi: core: Avoid printing an error if target_alloc() returns -ENXIO
- ARM: dts: nomadik: Fix up interrupt controller node names
- net: usb: lan78xx: don't modify phy_device state concurrently
- Bluetooth: hidp: use correct wait queue when removing ctrl_wait
- dccp: add do-while-0 stubs for dccp_pr_debug macros
- vhost: Fix the calculation in vhost_overflow()
- bnxt: don't lock the tx queue from napi poll
- net: 6pack: fix slab-out-of-bounds in decode_data
- ptp_pch: Restore dependency on PCI
- net: qlcnic: add missed unlock in qlcnic_83xx_flash_read32
- net: mdio-mux: Don't ignore memory allocation errors
- net: mdio-mux: Handle -EPROBE_DEFER correctly
- mmc: dw_mmc: Fix hang on data CRC error
- ALSA: hda - fix the 'Capture Switch' value change notifications
- ipack: tpci200: fix many double free issues in tpci200_pci_probe
- btrfs: prevent rename2 from exchanging a subvol with a directory from
  different parents
- ASoC: intel: atom: Fix breakage for PCM buffer address setup
- locks: print a warning when mount fails due to lack of "mand" support
- fs: warn about impending deprecation of mandatory locks
- netfilter: 

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-19 Thread Kleber Sacilotto de Souza
I confirm that bionic/linux 4.15.0-161.169 and bionic/linux-aws
4.15.0-1114.121 are not experiencing the reported boot issues on AWS
r5.metal or on any other platform/instance.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-15 Thread Stefan Bader
** Changed in: linux (Ubuntu Bionic)
   Status: In Progress => Fix Committed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-15 Thread Stefan Bader
** Also affects: linux-aws (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: linux-aws (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux-aws (Ubuntu Bionic)
   Status: New => In Progress

** Package changed: linux-aws (Ubuntu) => linux (Ubuntu)

** Changed in: linux (Ubuntu)
   Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-15 Thread Kleber Sacilotto de Souza
** Description changed:

- When creating an r5.metal instance on AWS, the default kernel is
- bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
- aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel.
+ 
+ [ Impact ]
+ The bionic 4.15 kernels are failing to boot on r5.metal instances on AWS
+ . The default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing 
to bionic/linux-aws(4.15.0-1113-aws) or bionic/linux (4.15.0-160.168) the 
machine fails to boot the 4.15 kernel.
+ 
+ This problem only appears on metal instances, which uses NVME instead of
+ XVDA devices.
+ 
+ [ Fix ]
+ It was discovered that after reverting the following two commits from 
upstream stable the 4.15 kernels can be booted again on the affected AWS metal 
instance:
+ 
+ PCI/MSI: Enforce that MSI-X table entry is masked for update
+ PCI/MSI: Enforce MSI[X] entry updates to be visible
+ 
+ [ Test Case ]
+ Deploy a r5.metal instance on AWS with a bionic image, which should boot 
initially with bionic/linux-aws-5.4. Install bionic/linux or bionic/linux-aws 
(4.15 based) and reboot the system.
+ 
+ [ Where problems could occur ]
+ These two commits are part of a larger patchset fixing PCI/MSI issues which 
were backported to some upstream stable releases. By reverting only part of the 
set we might end up with MSI issues that were not present before the whole set 
was applied. Regression potential can be minimized by testing the kernels with 
these two reverted patches on all the platforms available.
+ 
+ 
+ [ Original Description ]
+ When creating an r5.metal instance on AWS, the default kernel is 
bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to 
bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel.
  
  If I remove these patches the instance correctly boots the 4.15 kernel
  
  https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html
  
  With that being said, after successfully updating to the 4.15 without
  those patches applied, I can then upgrade to a 4.15 kernel with the
  above patches included, and the instance will boot properly.
  
  This problem only appears on metal instances, which uses NVME instead of
  XVDA devices.
  
  AWS instances also use the 'discard' mount option with ext4, thought
  maybe there could be a race condition between ext4 discard and journal
  flush.  Removed 'discard' from mount options and rebooted 5.4 kernel
  prior to 4.15 kernel installation, but still wouldn't boot after
  installing the 4.15 kernel.
  
  I have been unable to capture a stack trace using 'aws get-console-
  output'. After enabling kdump I was unable to replicate the failure. So
  there must be some sort of race with either ext4 and/or nvme.

** Description changed:

- 
  [ Impact ]
- The bionic 4.15 kernels are failing to boot on r5.metal instances on AWS
- . The default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing 
to bionic/linux-aws(4.15.0-1113-aws) or bionic/linux (4.15.0-160.168) the 
machine fails to boot the 4.15 kernel.
+ The bionic 4.15 kernels are failing to boot on r5.metal instances on AWS. The 
default kernel is bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to 
bionic/linux-aws(4.15.0-1113-aws) or bionic/linux (4.15.0-160.168) the machine 
fails to boot the 4.15 kernel.
  
  This problem only appears on metal instances, which uses NVME instead of
  XVDA devices.
  
  [ Fix ]
  It was discovered that after reverting the following two commits from 
upstream stable the 4.15 kernels can be booted again on the affected AWS metal 
instance:
  
  PCI/MSI: Enforce that MSI-X table entry is masked for update
  PCI/MSI: Enforce MSI[X] entry updates to be visible
  
  [ Test Case ]
  Deploy a r5.metal instance on AWS with a bionic image, which should boot 
initially with bionic/linux-aws-5.4. Install bionic/linux or bionic/linux-aws 
(4.15 based) and reboot the system.
  
  [ Where problems could occur ]
  These two commits are part of a larger patchset fixing PCI/MSI issues which 
were backported to some upstream stable releases. By reverting only part of the 
set we might end up with MSI issues that were not present before the whole set 
was applied. Regression potential can be minimized by testing the kernels with 
these two reverted patches on all the platforms available.
- 
  
  [ Original Description ]
  When creating an r5.metal instance on AWS, the default kernel is 
bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to 
bionic/linux-aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel.
  
  If I remove these patches the instance correctly boots the 4.15 kernel
  
  https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html
  
  With that being said, after successfully updating to the 4.15 without
  those patches applied, I can then upgrade to a 4.15 kernel with the
  above patches included, and the instance will boot properly.
  
  This problem only appears on metal instances, which uses NVME 

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-14 Thread Ian May
As I was bisecting the commits, I was attempting to take advantage of
parallelism. While my test kernel was building I would deploy a clean
AWS r5.metal instance.  I started seeing test kernels boot that I
wouldn't expect to boot.  So I decided as a sanity test, I would deploy
an r5.metal instance, let it sit idle for 20 minutes and then install
the known problematic 4.15.0-1113-aws kernel.  Sure enough it booted
fine.  Tried the same thing again with letting it sit idle 20 mins and
it worked again.  So this does appear to be a race condition.  I think
this also explains some of the erratic test results I've seen while
looking at this bug.  Fortunately the console output gave us some
definitive proof as to where the problem was occurring.

With that being said, it appears I have found the offending commits.

PCI/MSI: Enforce that MSI-X table entry is masked for update
PCI/MSI: Enforce MSI[X] entry updates to be visible

https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-
aws/+git/bionic/commit/?id=27571f5ea1dd074924b41a455c50dc2278e8c2b7

https://git.launchpad.net/~canonical-kernel/ubuntu/+source/linux-
aws/+git/bionic/commit/?id=2478f358c2b35fea04e005447ce99ad8dc53fd5d

More specifically the hang is introduced by 'PCI/MSI: Enforce that MSI-X
table entry is masked for update', but it isn't a clean revert without
reverting the other commit.  So for a quick test confirmation I reverted
both.

I have not had a chance to determine why these commits are causing the
problem, but with these reverted in a test build on top of
4.15.0-1113-aws, I can migrate from 5.4 to 4.15 as soon as the instance
is available.  I've done at least 6 attempts now and all have passed and
doing the same steps without the reverts all have hung(unless I wait 20
mins).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-14 Thread Ian May
Hi Mauricio,

Thanks for getting this info.  This is very helpful!  I see a few
potential patches between 4.15.0-159.167 and 4.15.0-160.168 that could
be related to the hang.  This will help greatly with the bisect.

Ian

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-13 Thread Mauricio Faria de Oliveira
Steps to reproduce:
---

Ubuntu 18.04 image in AWS r5.metal instance type.

$ lsb_release -cs
bionic

$ dmesg | grep DMI:
[0.00] DMI: Amazon EC2 r5.metal/Not Specified, BIOS 1.0 10/16/2017

$ uname -rv
5.4.0-1045-aws #47~18.04.1-Ubuntu SMP Tue Apr 13 15:58:14 UTC 2021

$ sudo add-apt-repository ppa:canonical-kernel-team/ppa

$ sudo apt install linux-image-4.15.0-1113-aws

$ sudo grub-reboot 'Advanced options for Ubuntu>Ubuntu, with Linux
4.15.0-1113-aws'

$ sudo grub-editenv /boot/grub/grubenv list
next_entry=Advanced options for Ubuntu>Ubuntu, with Linux 4.15.0-1113-aws

$ sudo reboot

The system should not come back from reboot (which usually takes 10-15
minutes on r5.metal)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-13 Thread Mauricio Faria de Oliveira
** Attachment added: "serial-console-output.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+attachment/5532619/+files/serial-console-output.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-13 Thread Mauricio Faria de Oliveira
We've got a serial console log from AWS Support through our Support team
(special thanks to Pedro Principeza and our former colleague Mark Thomas.)

The problem is definitely not the ext4/jbd2 patchset as suspected
(although it's unclear how reverting it caused the kernel to boot;
maybe build environment differences?)

Early in the kernel boot, before even trying to mount the rootfs,
there are blcked swapper tasks, and they just continue to happen.

(full log attached.)

```
 Starting Reboot...
...
[0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.15.0-1113-aws 
root=UUID=db937f23-4ed7-4c4b-8058-b23a860fae08 ro console=tty1 console=ttyS0 
nvme_core.io_timeout=4294967295
...
[0.00]  gran_size: 64K  chunk_size: 256Mnum_reg: 10 lose cover 
RAM: 737G
...
[2.742455] clocksource: Switched to clocksource tsc
[  242.656089] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[  363.488083] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[  484.320066] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[  605.152061] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[  725.984054] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[  846.816051] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[  967.648055] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 1088.480033] INFO: task swapper/0:1 blocked for more than 120 seconds.
...
[ 1209.312036] INFO: task swapper/0:1 blocked for more than 120 seconds.
...

```

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-11 Thread Mauricio Faria de Oliveira
Hey Kleber,

Thanks for confirming.

I guess there might be something wrong with the boot process on r5.metal, 
specifically:
- there's no issue with kexec boot, just with normal boot (same code and 
from/to versions)
- there's no issue with normal boot on similar instance types (r5d.metal, 
r5.24xlarge) 
- rarely, but sometimes, it works with normal boot.

I'm looking forward to hearing back from Ian on whether he could 
get additional information from AWS, or how to reach out to them,
about getting the serial console log from the r5.metal instances
internally, via AWS support, so we know what the actual issue is.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-11 Thread Kleber Sacilotto de Souza
Hi Mauricio,

We are seeing the issue only on r5.metal.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-11 Thread Mauricio Faria de Oliveira
Hi Kleber,

Thanks for the info.

The impact on bionic/generic is also exclusively on aws r5.metal or
broader?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-11 Thread Kleber Sacilotto de Souza
This issue is also affecting the bionic/linux generic kernel.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-08 Thread Ian May
Mauricio,

Interesting update, I agree that we need more info as to what the state
is when the instance won't boot switching to the new 4.15 kernel.  I'll
check with my team in the morning and see if we can get additional info
from AWS

I was trying a few more scenarios this evening the first being the most
interesting.

Scenario 1
start with 5.4.0-1056-aws
install 5.4.0-1058-aws
reboot
confirm 5.4.0-1058-aws booted
reboot AGAIN
install 4.15.0-1113-aws
reboot
machine booted 4.15.0-1113-aws successfully

Scenario 2
start with 5.4.0-1056-aws
install 4.15.0-1112-aws
reboot
install 4.15.0-1113-aws
reboot
confirmed 4.15.0-1113-aws booted
then booted back into 5.4.0-1056-aws
removed 4.15.0-1112-aws and 4.15.0-1113-aws
rebooted again for good measure
confirmed still running 5.4.0-1056-aws
installed 4.15.0-1113-aws
rebooted
4.15.0-1113-aws successfully loaded

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-07 Thread Mauricio Faria de Oliveira
Ian,

Do you/team have contacts in here or AWS that could help with that?

I think that other lines of investigation now, after our findings
and apparent inconsistencies, would be based on speculation, and
we're better trying to get real information/logs from the system
with AWS Support.

cheers,
Mauricio

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-07 Thread Mauricio Faria de Oliveira
Today I wanted to try and instrument the boot process a bit,
since we have no serial console in the nitro metal instances.

I was looking for pstore_blk (hoping we could panic_on_warn
or panic_on_oops), but it's only available in 5.8+ it seems.)

So I decided to start with grub, and keep a progress variable
in grubenv, and use grub-reboot to boot 4.15.0-1113-aws _once_
(as it's expected to fail), then (force) stop and start again,
and check grubenv in 5.4.0-*-aws (which works.)

Interestingly, in one of such attempts 4.15.0-1113-aws WORKED.

In another attempt, I could see the progress variable for the
4.15 _and_ 5.14 kernels, so it seems that grub booted 4.15
but it didn't make it to the fully booted system. (i.e., grub
seems to be working correctly.)

In the other attempts I noticed that once we try to boot 4.15,
the system seems to become weird and not react quickly even
to the 'Force stop' method (after you try 'Stop' that doesn't
work.)

...

So, since 4.15 worked/booted once, and the systems seem weird,
and Ian just posted that he had a different result/questioned
previous result (ie, it might well be a _different_ result),
I wonder if somehow this particular instance type is acting up.

Given that 4.15 worked/booted ~20 times under kexec, it's not
unreasonable to consider there might be something going on in
normal boot.

I think we should probably engage AWS Support to try and ask
for a console log using an internally available method (seen
it elsewhere iirc), and also to clarify differences in boot
disk among instace types r5.metal (fail), r5d.metal (works),
and r5d.24xlarge (works) -- they all have EBS/nvme as '/'.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-07 Thread Ian May
Just want to add an update.  I haven't been able to replicate
successfully booting 4.15.0-1113-aws from 5.4.0-1058-aws, so I'm
questioning whether I made a mistake the time I thought it was
successful.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-07 Thread Ian May
Thanks for the in-depth update Mauricio!  Is there any investigation
you'd like me to specifically target?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-06 Thread Mauricio Faria de Oliveira
For the record, 4.15.0-1113-aws works in r5.metal w/ kexec.

Booted it 10 times successfully from both 5.4.0-1058-aws
and 4.15.0-1113-aws (itself.)

(not that it was expected to make a difference as the issue
happens on normal boot, which doesn't have previous kernel.)

Right after that, in the same instance, trying a normal boot
fails.

And it had kdump installed/enabled (ie, crashkernel in cmdline),
w/ which Ian mentioned that he couldn't reproduce the problem.

---

It also works on normal boot w/ r5d.metal (note 'd'), which
should be the same as r5.metal but w/ four local nvme disks.
(still boots from EBS/nvme disk in the same way as r5.metal)

---

Similarly, it works on r4.24xlarge (this is not metal) but
does boot from EBS/nvme disk too.

---

So it seems like there's no problem with the patchset as in
4.15.0-1113-aws as it boots fine in several types w/ approx
the same hardware config, just differing on normal/kexec in
the r5.metal type (problem report.)

- r5.metal: normal boot fails / kexec boot works
- r5d.metal: normal boot works.
- r5.24xlarge: normal boot works.

The kexec boot worked ~20 times, so it wouldn't seem like a
race condition is in place, as that should be enough runs,
considering it failed every time on normal boot.

Also, Ian mentioned that he couldn't reproduce w/ crashdump
installed. Well, I think the only difference it would cause
_before_ mounting the rootfs (assuming that's what doesn't
work/allow machine to boot, as we have no serial console)
is the crashkernel reservation?

---

So, all this is a bit confusing, but seem to indicate again
that there's no problem w/ the patchset per se, but perhaps
something in booting this particular kernel on a particular
instance type (r5.metal) which _might_ be related to normal/
kexec/crashkernel boot differences.

More tomorrow.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-06 Thread Mauricio Faria de Oliveira
BTW, do you know of the differences between r5.metal and r5.24xlarge?

Per the specs they seem to be the same as in cpu/ram/nic/_nvme_ storage,
but differ in baremetal vs nitro hypervisor?

The reason I ask is because downgrading from 5.4.0-1056-aws to 4.15.0-1113-aws
worked/booted fine on r5.24xlarge, differently from r5.metal.

I got to test it while looking for a similar instance type that had
serial console support, but it didn't repro the problem, actually.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-06 Thread Mauricio Faria de Oliveira
It looks like it's not a problem with the patchset in general,
maybe it's specific to aws 4.15?

The patchset is in 5.4.0-1058-aws and it booted fine here too.
I'll check the patchset in 4.15.0-1113-aws.

A difference from your comment is that I could _not_ boot it
after 5.4.0-1058-aws, which worked for you. (r5.metal usually
boots in ~15min? and it's been ~20min already and no response;
serial console/screenshot is not availabe on bare metal sadly.)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-06 Thread Ian May
** Description changed:

  When creating an r5.metal instance on AWS, the default kernel is
  bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
  aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel.
  
  If I remove these patches the instance correctly boots the 4.15 kernel
  
  https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html
  
  With that being said, after successfully updating to the 4.15 without
  those patches applied, I can then upgrade to a 4.15 kernel with the
  above patches included, and the instance will boot properly.
  
  This problem only appears on metal instances, which uses NVME instead of
  XVDA devices.
  
  AWS instances also use the 'discard' mount option with ext4, thought
  maybe there could be a race condition between ext4 discard and journal
- flush.  Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15
- kernel installation, but still wouldn't boot after installing the 4.15
- kernel.
+ flush.  Removed 'discard' from mount options and rebooted 5.4 kernel
+ prior to 4.15 kernel installation, but still wouldn't boot after
+ installing the 4.15 kernel.
  
  I have been unable to capture a stack trace using 'aws get-console-
  output'. After enabling kdump I was unable to replicate the failure. So
  there must be some sort of race with either ext4 and/or nvme.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-06 Thread Ian May
Confirmed it does work to first upgrade bionic/linux-5.4 from
5.4.0-1056-aws to 5.4.0-1058-aws and then update to 4.15.0-1113-aws

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-06 Thread Mauricio Faria de Oliveira
Hey Ian, thanks for the bug report! I'm checking this on AWS.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-05 Thread Ian May
** Description changed:

  When creating an r5.metal instance on AWS, the default kernel is
  bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
- aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel.
+ aws(4.15.0-1113-aws) the machine fails to boot the 4.15 kernel.
  
  If I remove these patches the instance correctly boots the 4.15 kernel
  
  https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html
  
- But after successfully updating to the 4.15 without those patches
- applied, I can then upgrade to a 4.15 kernel with the above patches
- included, and the instance will boot properly.
+ With that being said, after successfully updating to the 4.15 without
+ those patches applied, I can then upgrade to a 4.15 kernel with the
+ above patches included, and the instance will boot properly.
  
  This problem only appears on metal instances, which uses NVME instead of
  XVDA devices.
  
  AWS instances also use the 'discard' mount option with ext4, thought
  maybe there could be a race condition between ext4 discard and journal
  flush.  Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15
  kernel installation, but still wouldn't boot after installing the 4.15
  kernel.
  
  I have been unable to capture a stack trace using 'aws get-console-
  output'. After enabling kdump I was unable to replicate the failure. So
  there must be some sort of race with either ext4 and/or nvme.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-05 Thread Ian May
** Description changed:

  When creating an r5.metal instance on AWS, the default kernel is
  bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
  aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel.
  
  If I remove these patches the instance correctly boots the 4.15 kernel
  
  https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html
  
  But after successfully updating to the 4.15 without those patches
  applied, I can then upgrade to a test kernel with the above patches
  included, and the instance will boot properly.
  
  This problem only appears on metal instances, which uses NVME instead of
  XVDA devices.
  
  AWS instances also use the 'discard' mount option with ext4, thought
  maybe there could be a race condition between ext4 discard and journal
  flush.  Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15
  kernel installation, but still wouldn't boot.
+ 
+ I have been unable to capture a stack trace using 'aws get-console-
+ output'. I enabled kdump and was unable to replicate the failure. So
+ there must be some sort of race with either ext4 and/or nvme.

** Description changed:

  When creating an r5.metal instance on AWS, the default kernel is
  bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
  aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel.
  
  If I remove these patches the instance correctly boots the 4.15 kernel
  
  https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html
  
  But after successfully updating to the 4.15 without those patches
- applied, I can then upgrade to a test kernel with the above patches
+ applied, I can then upgrade to a 4.15 kernel with the above patches
  included, and the instance will boot properly.
  
  This problem only appears on metal instances, which uses NVME instead of
  XVDA devices.
  
  AWS instances also use the 'discard' mount option with ext4, thought
  maybe there could be a race condition between ext4 discard and journal
  flush.  Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15
  kernel installation, but still wouldn't boot.
  
  I have been unable to capture a stack trace using 'aws get-console-
  output'. I enabled kdump and was unable to replicate the failure. So
  there must be some sort of race with either ext4 and/or nvme.

** Description changed:

  When creating an r5.metal instance on AWS, the default kernel is
  bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
  aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel.
  
  If I remove these patches the instance correctly boots the 4.15 kernel
  
  https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html
  
  But after successfully updating to the 4.15 without those patches
  applied, I can then upgrade to a 4.15 kernel with the above patches
  included, and the instance will boot properly.
  
  This problem only appears on metal instances, which uses NVME instead of
  XVDA devices.
  
  AWS instances also use the 'discard' mount option with ext4, thought
  maybe there could be a race condition between ext4 discard and journal
  flush.  Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15
- kernel installation, but still wouldn't boot.
+ kernel installation, but still wouldn't boot after installing the 4.15
+ kernel.
  
  I have been unable to capture a stack trace using 'aws get-console-
- output'. I enabled kdump and was unable to replicate the failure. So
+ output'. After enabling kdump I was unable to replicate the failure. So
  there must be some sort of race with either ext4 and/or nvme.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-05 Thread Ian May
Have been unable to capture a stack trace using 'aws get-console-
output'. Enabled kdump and was unable to replicate the failed boot,
which makes this feel like a race condition with NVME.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1946149] Re: Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on r5.metal

2021-10-05 Thread Ian May
** Description changed:

  When creating an r5.metal instance on AWS, the default kernel is
  bionic/linux-aws-5.4(5.4.0-1056-aws), when changing to bionic/linux-
  aws(4.15.0-1113-aws) the machine fails to boot 4.15 kernel.
+ 
+ If I remove these patches the instance correctly boots the 4.15 kernel
+ 
+ https://lists.ubuntu.com/archives/kernel-team/2021-September/123963.html
+ 
+ But after successfully updating to the 4.15 without those patches
+ applied, I can then upgrade to a test kernel with the above patches
+ included, and the instance will boot properly.
+ 
+ This problem only appears on metal instances, which uses NVME instead of
+ XVDA devices.
+ 
+ AWS instances also use the 'discard' mount option with ext4, thought
+ maybe there could be a race condition between ext4 discard and journal
+ flush.  Removed 'discard' mount and rebooted 5.4 kernel prior to 4.15
+ kernel installation, but still wouldn't boot.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1946149

Title:
  Bionic/linux-aws Boot failure downgrading from Bionic/linux-aws-5.4 on
  r5.metal

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-aws/+bug/1946149/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs