Hey Jeremy,

Understood; thanks for clarifying!

I thought that maybe having upstream input could be helpful
(as Julian mentioned to avoid diverging further), but if it
is so much different nowadays, as you mentioned, it likely
won't be that helpful anyway.

By the way, appreciate that "newbie" statement.. after all
this work/analysis/patches, I couldn't think that of you :)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1842320

Title:
  Can't boot: "error: out of memory." immediately after the grub menu

Status in grub:
  Unknown
Status in OEM Priority Project:
  Triaged
Status in grub2-signed package in Ubuntu:
  Confirmed
Status in initramfs-tools package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]

   * In some cases, if the users’ initramfs grow bigger, then it’ll
  likely not be able to be loaded by grub2.

   * Some real cases from OEM projects:

  In many built-in 4k monitor laptops with nvidia drivers, the u-d-c
  puts the nvidia*.ko to initramfs which grows the initramfs to ~120M.
  Also the gfxpayload=auto will remain to use 4K resolution since it’s
  what EFI POST passed.

  In this case, the grub isn't able to load initramfs because the
  grub_memalign() won't be able to get suitable memory for the larger
  file:

  ```
  #0 grub_memalign (align=1, size=592214020) at ../../../grub-core/kern/mm.c:376
  #1 0x000000007dd7b074 in grub_malloc (size=592214020) at 
../../../grub-core/kern/mm.c:408
  #2 0x000000007dd7a2c8 in grub_verifiers_open (io=0x7bc02d80, type=131076)
      at ../../../grub-core/kern/verifiers.c:150
  #3 0x000000007dd801d4 in grub_file_open (name=0x7bc02f00 
"/boot/initrd.img-5.17.0-1011-oem",
      type=131076) at ../../../grub-core/kern/file.c:121
  #4 0x000000007bcd5a30 in ?? ()
  #5 0x000000007fe21247 in ?? ()
  #6 0x000000007bc030c8 in ?? ()
  #7 0x000000017fe21238 in ?? ()
  #8 0x000000007bcd5320 in ?? ()
  #9 0x000000007fe21250 in ?? ()
  #10 0x0000000000000000 in ?? ()
  ```

  Based on grub_mm_dump, we can see the memory fragment (some parts seem
  likely be used because of 4K resolution?) and doesn’t have available
  contiguous memory for larger file as:

  ```
  grub_real_malloc(...)
  ...
  if (cur->size >= n + extra)
  ```

  Based on UEFI Specification Section 7.2[1] and UEFI driver writers’
  guide 4.2.3[2], we can ask 32bits+ on AllocatePages().

  As most X86_64 platforms should support 64 bits addressing, we should
  extend GRUB_EFI_MAX_USABLE_ADDRESS to 64 bits to get more available
  memory.

   * When users grown the initramfs, then probably will get initramfs
  not found which really annoyed and impact the user experience (system
  not able to boot).

  [Test Plan]

   * detailed instructions how to reproduce the bug:

  1. Any method to grow the initramfs, such as install nvidia-driver.

  2. If developers would like to reproduce, then could dd if=/dev/random
  of=... bs=1M count=500, something like:

  ```
  $ cat /usr/share/initramfs-tools/hooks/zzz-touch-a-file
  #!/bin/sh

  PREREQ=""

  prereqs()
  {
          echo "$PREREQ"
  }

  case $1 in
  # get pre-requisites
  prereqs)
          prereqs
          exit 0
          ;;
  esac

  . /usr/share/initramfs-tools/hook-functions
  dd if=/dev/random of=${DESTDIR}/test-500M bs=1M count=500
  ```

  And then update-initramfs

   * After applying my patches, the issue is gone.

   * I did also test my test grubx64.efi in:

  1. X86_64 qemu with
  1.1. 60M initramfs + 5.15.0-37-generic kernel
  1.2. 565M initramfs + 5.17.0-1011-oem kernel

  2. Amd64 HP mobile workstation with
  2.1. 65M initramfs + 5.15.0-39-generic kernel
  2.2. 771M initramfs + 5.17.0-1011-oem kernel

  All working well.

  [Where problems could occur]

  * The changes almost in i386/efi, thus the impact will be in the i386 / 
x86_64 EFI system.
  The other change is to modify the “grub-core/kern/efi/mm.c” but I use the 
original addressing for “arm/arm64/ia64/riscv32/riscv64”.
  Thus it should not impact them.

  * There is a “#if defined(__x86_64__)” which intent to limit the >
  32bits code in i386 system and also

  ```
   #if defined (__code_model_large__)
  -#define GRUB_EFI_MAX_USABLE_ADDRESS 0xffffffff
  +#define GRUB_EFI_MAX_USABLE_ADDRESS __UINTPTR_MAX__
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x7fffffff
   #else
   #define GRUB_EFI_MAX_USABLE_ADDRESS 0x7fffffff
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x3fffffff
   #endif
  ```

  If everything works as expected, then i386 should working good.

  If not lucky, based on “UEFI writers’ guide”[2], the i386 will get >
  4GB memory region and never be able to access.

  [Other Info]

   * Upstream grub2 bug #61058
  https://savannah.gnu.org/bugs/index.php?61058

   * Test PPA: https://launchpad.net/~os369510/+archive/ubuntu/lp1842320

   * Test grubx64.efi:
  https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320

   * Test source code: https://github.com/os369510/grub2/tree/lp1842320

   * If you built the package, then test grubx64.efi is under
  “obj/monolithic/grub-efi-amd64/grubx64.efi”, in my case:
  `/var/cache/pbuilder/build/276481/build/grub2-2.06/obj/monolithic/grub-
  efi-amd64/grubx64.efi`

   * My build command: `sudo PBSHELL=1 pbuilder build --hookdir ~/hook-
  dir ubuntu-grub/grub2_2.06-2ubuntu7+jeremydev2.dsc 2>&1 | tee
  build.log`

   * My qemu command: `qemu-system-x86_64 -bios
  edk2/Build/OvmfX64/DEBUG_GCC5/FV/OVMF.fd -hda Templates/grub.qcow2 -m
  6G -vga cirrus -smp 8 -machine type=q35,accel=kvm -cpu host -enable-
  kvm -boot menu=on` (I built an edk2 binary with debugging log)

   * You can use my grubx64.efi with debug symbols from
  https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320-dev-
  with-debug-symbols and source code is from
  https://github.com/os369510/grub2/tree/jeremy-dev .

  After built the package from source code, then you can use gdb to
  attach the qemu session as:

  ```
  ubuntu@ubuntu-HP-ZBook-Fury-16-G9-Mobile-Workstation-PC [ 
/var/cache/pbuilder/build/35354/tmp/buildd/grub2-2.06/obj/grub-efi-amd64/grub-core
 ]
  $ gdb -x gdb_grub # with “add-symbol-file kernel.img ${address}
  ```

  The address above can read from qemu serial port and found the last
  “Loading driver at 0x000xxxxxxxxxx EntryPoint=0x000xxxxxxxabc”

  In above case, fill “0x000xxxxxxxabc” to ${address}.

  [1] 
https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
  [2] 
https://edk2-docs.gitbook.io/edk-ii-uefi-driver-writer-s-guide/4_general_driver_design_guidelines/readme.2/423_use_uefi_memory_allocation_services

  ---

  Upgraded from 19.04 to current 19.10 using "do-release-upgrade -d".
  Can still boot using the previous 5.0.0-25-generic kernel, but the
  5.2.0-15-generic fails to start.

  On selecting Ubuntu from Grub, the message "error: out of memory." is
  immediately shown. Pressing a key attempts to start boot-up but fails
  to mount root fs.

  Machine is HP Spectre X360 with 8GB RAM. Under kernel 5.0.0, free
  shows the following (run from Gnome terminal):

                total        used        free      shared  buff/cache   
available
  Mem:        7906564     1761196     3833240     1020216     2312128     
4849224
  Swap:       1003516           0     1003516

  Kernel packages installed:

  linux-generic                              5.2.0.15.16 amd64
  linux-headers-5.2.0-15                     5.2.0-15.16 all
  linux-headers-5.2.0-15-generic             5.2.0-15.16 amd64
  linux-headers-generic                      5.2.0.15.16 amd64
  linux-image-5.0.0-25-generic               5.0.0-25.26 amd64
  linux-image-5.2.0-15-generic               5.2.0-15.16+signed1 amd64
  linux-image-generic                        5.2.0.15.16 amd64
  linux-modules-5.0.0-25-generic             5.0.0-25.26 amd64
  linux-modules-5.2.0-15-generic             5.2.0-15.16 amd64
  linux-modules-extra-5.0.0-25-generic       5.0.0-25.26 amd64
  linux-modules-extra-5.2.0-15-generic       5.2.0-15.16 amd64

  Photo of kernel panic attached.

  NVMe drive partition layout (GPT):

  Device           Start        End   Sectors   Size Type
  /dev/nvme0n1p1    2048    1050623   1048576   512M EFI System
  /dev/nvme0n1p2 1050624    2549759   1499136   732M Linux filesystem
  /dev/nvme0n1p3 2549760 1000214527 997664768 475.7G Linux filesystem

  $ sudo pvs
    PV                          VG        Fmt  Attr PSize    PFree
    /dev/mapper/nvme0n1p3_crypt ubuntu-vg lvm2 a--  <475.71g    0

  $ sudo lvs
    LV     VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log 
Cpy%Sync Convert
    root   ubuntu-vg -wi-ao---- 474.75g
    swap_1 ubuntu-vg -wi-ao---- 980.00m

  Partition 3 is LUKS encrypted. Root LV is ext4.
  ---
  ProblemType: Bug
  ApportVersion: 2.20.11-0ubuntu7
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  gmckeown   1647 F.... pulseaudio
  CurrentDesktop: ubuntu:GNOME
  DistroRelease: Ubuntu 19.10
  InstallationDate: Installed on 2019-08-15 (18 days ago)
  InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Release amd64 (20190416)
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 003: ID 8087:0a2b Intel Corp.
   Bus 001 Device 002: ID 04f2:b593 Chicony Electronics Co., Ltd HP Wide Vision 
FHD Camera
   Bus 001 Device 004: ID 046d:c52b Logitech, Inc. Unifying Receiver
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: HP HP Spectre x360 Convertible 13-ae0xx
  Package: linux (not installed)
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.0.0-25-generic 
root=/dev/mapper/ubuntu--vg-root ro quiet splash
  ProcVersionSignature: Ubuntu 5.0.0-25.26-generic 5.0.18
  RelatedPackageVersions:
   linux-restricted-modules-5.0.0-25-generic N/A
   linux-backports-modules-5.0.0-25-generic  N/A
   linux-firmware                            1.181
  Tags:  eoan
  Uname: Linux 5.0.0-25-generic x86_64
  UpgradeStatus: Upgraded to eoan on 2019-09-02 (0 days ago)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 05/17/2019
  dmi.bios.vendor: AMI
  dmi.bios.version: F.25
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: 83B9
  dmi.board.vendor: HP
  dmi.board.version: 56.43
  dmi.chassis.type: 31
  dmi.chassis.vendor: HP
  dmi.chassis.version: Chassis Version
  dmi.modalias: 
dmi:bvnAMI:bvrF.25:bd05/17/2019:svnHP:pnHPSpectrex360Convertible13-ae0xx:pvr:rvnHP:rn83B9:rvr56.43:cvnHP:ct31:cvrChassisVersion:
  dmi.product.family: 103C_5335KV HP Spectre
  dmi.product.name: HP Spectre x360 Convertible 13-ae0xx
  dmi.product.sku: 2QH38EA#ABU
  dmi.sys.vendor: HP

To manage notifications about this bug go to:
https://bugs.launchpad.net/grub/+bug/1842320/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to