[Touch-packages] [Bug 1842320] Re: Can't boot: "error: out of memory." immediately after the grub menu

2022-11-29 Thread Craig Hesling
Ah, didn't realize Debian had a different limit. Thanks @juliank.

@anourzad, I'm not sure I would want to start adding custom steps to the
kernel update path. I'm much more inclined to add an /etc option for a
supported automatic mechanism, like compression or module selection.

@adrien-n, great notes in #125, I'm sure this will help others.

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to initramfs-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1842320

Title:
  Can't boot: "error: out of memory." immediately after the grub menu

Status in grub:
  Unknown
Status in OEM Priority Project:
  Triaged
Status in grub2-signed package in Ubuntu:
  Triaged
Status in grub2-unsigned package in Ubuntu:
  Triaged
Status in initramfs-tools package in Ubuntu:
  Won't Fix
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]

   * In some cases, if the users’ initramfs grow bigger, then it’ll
  likely not be able to be loaded by grub2.

   * Some real cases from OEM projects:

  In many built-in 4k monitor laptops with nvidia drivers, the u-d-c
  puts the nvidia*.ko to initramfs which grows the initramfs to ~120M.
  Also the gfxpayload=auto will remain to use 4K resolution since it’s
  what EFI POST passed.

  In this case, the grub isn't able to load initramfs because the
  grub_memalign() won't be able to get suitable memory for the larger
  file:

  ```
  #0 grub_memalign (align=1, size=592214020) at ../../../grub-core/kern/mm.c:376
  #1 0x7dd7b074 in grub_malloc (size=592214020) at 
../../../grub-core/kern/mm.c:408
  #2 0x7dd7a2c8 in grub_verifiers_open (io=0x7bc02d80, type=131076)
  at ../../../grub-core/kern/verifiers.c:150
  #3 0x7dd801d4 in grub_file_open (name=0x7bc02f00 
"/boot/initrd.img-5.17.0-1011-oem",
  type=131076) at ../../../grub-core/kern/file.c:121
  #4 0x7bcd5a30 in ?? ()
  #5 0x7fe21247 in ?? ()
  #6 0x7bc030c8 in ?? ()
  #7 0x00017fe21238 in ?? ()
  #8 0x7bcd5320 in ?? ()
  #9 0x7fe21250 in ?? ()
  #10 0x in ?? ()
  ```

  Based on grub_mm_dump, we can see the memory fragment (some parts seem
  likely be used because of 4K resolution?) and doesn’t have available
  contiguous memory for larger file as:

  ```
  grub_real_malloc(...)
  ...
  if (cur->size >= n + extra)
  ```

  Based on UEFI Specification Section 7.2[1] and UEFI driver writers’
  guide 4.2.3[2], we can ask 32bits+ on AllocatePages().

  As most X86_64 platforms should support 64 bits addressing, we should
  extend GRUB_EFI_MAX_USABLE_ADDRESS to 64 bits to get more available
  memory.

   * When users grown the initramfs, then probably will get initramfs
  not found which really annoyed and impact the user experience (system
  not able to boot).

  [Test Plan]

   * detailed instructions how to reproduce the bug:

  1. Any method to grow the initramfs, such as install nvidia-driver.

  2. If developers would like to reproduce, then could dd if=/dev/random
  of=... bs=1M count=500, something like:

  ```
  $ cat /usr/share/initramfs-tools/hooks/zzz-touch-a-file
  #!/bin/sh

  PREREQ=""

  prereqs()
  {
  echo "$PREREQ"
  }

  case $1 in
  # get pre-requisites
  prereqs)
  prereqs
  exit 0
  ;;
  esac

  . /usr/share/initramfs-tools/hook-functions
  dd if=/dev/random of=${DESTDIR}/test-500M bs=1M count=500
  ```

  And then update-initramfs

   * After applying my patches, the issue is gone.

   * I did also test my test grubx64.efi in:

  1. X86_64 qemu with
  1.1. 60M initramfs + 5.15.0-37-generic kernel
  1.2. 565M initramfs + 5.17.0-1011-oem kernel

  2. Amd64 HP mobile workstation with
  2.1. 65M initramfs + 5.15.0-39-generic kernel
  2.2. 771M initramfs + 5.17.0-1011-oem kernel

  All working well.

  [Where problems could occur]

  * The changes almost in i386/efi, thus the impact will be in the i386 / 
x86_64 EFI system.
  The other change is to modify the “grub-core/kern/efi/mm.c” but I use the 
original addressing for “arm/arm64/ia64/riscv32/riscv64”.
  Thus it should not impact them.

  * There is a “#if defined(__x86_64__)” which intent to limit the >
  32bits code in i386 system and also

  ```
   #if defined (__code_model_large__)
  -#define GRUB_EFI_MAX_USABLE_ADDRESS 0x
  +#define GRUB_EFI_MAX_USABLE_ADDRESS __UINTPTR_MAX__
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x7fff
   #else
   #define GRUB_EFI_MAX_USABLE_ADDRESS 0x7fff
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x3fff
   #endif
  ```

  If everything works as expected, then i386 should working good.

  If not lucky, based on “UEFI writers’ guide”[2], the i386 will get >
  4GB memory region and never be able to access.

  [Other Info]

   * Upstream grub2 bug #61058
  https://savannah.gnu.org/bugs/index.php?61058

   * Test PPA: https://launchpad.net/~os369510/+archive/ubuntu/lp1842320

   * Test grubx64.efi:
  

[Touch-packages] [Bug 1842320] Re: Can't boot: "error: out of memory." immediately after the grub menu

2022-11-21 Thread Craig Hesling
Confirmed that reducing the size of the initrd.img slightly allowed my
machine to boot. I just enabled `COMPRESSLEVEL=19` with `COMPRESS=zstd`
(default on debian) in initramfs.conf. That reduced my initrd.img size
from 72MB to 62MB, which allowed my machine to boot.

Ultimately, I set `MODULES=dep` (instead of `MODULES=most`) and reverted
the compression to defaults. This reduced my initrd.img to 22MB, which
includes Intel microcode (specific binary), nvme, crypto, and lvm
support, among other things.

---

For the record:

My machine is running Debian bookworm with the Nvidia proprietary
drivers with 32GB. I first noticed the issue when, coincidentally,
upgrading from kernel 5.19 to 6.0.

```
Loading Linux 6.0.0-4amd64 ...
Loading initial ramdisk ...
error: out of memory.

Press any key to continue...
```

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to initramfs-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1842320

Title:
  Can't boot: "error: out of memory." immediately after the grub menu

Status in grub:
  Unknown
Status in OEM Priority Project:
  Triaged
Status in grub2-signed package in Ubuntu:
  Triaged
Status in grub2-unsigned package in Ubuntu:
  Triaged
Status in initramfs-tools package in Ubuntu:
  Won't Fix
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]

   * In some cases, if the users’ initramfs grow bigger, then it’ll
  likely not be able to be loaded by grub2.

   * Some real cases from OEM projects:

  In many built-in 4k monitor laptops with nvidia drivers, the u-d-c
  puts the nvidia*.ko to initramfs which grows the initramfs to ~120M.
  Also the gfxpayload=auto will remain to use 4K resolution since it’s
  what EFI POST passed.

  In this case, the grub isn't able to load initramfs because the
  grub_memalign() won't be able to get suitable memory for the larger
  file:

  ```
  #0 grub_memalign (align=1, size=592214020) at ../../../grub-core/kern/mm.c:376
  #1 0x7dd7b074 in grub_malloc (size=592214020) at 
../../../grub-core/kern/mm.c:408
  #2 0x7dd7a2c8 in grub_verifiers_open (io=0x7bc02d80, type=131076)
  at ../../../grub-core/kern/verifiers.c:150
  #3 0x7dd801d4 in grub_file_open (name=0x7bc02f00 
"/boot/initrd.img-5.17.0-1011-oem",
  type=131076) at ../../../grub-core/kern/file.c:121
  #4 0x7bcd5a30 in ?? ()
  #5 0x7fe21247 in ?? ()
  #6 0x7bc030c8 in ?? ()
  #7 0x00017fe21238 in ?? ()
  #8 0x7bcd5320 in ?? ()
  #9 0x7fe21250 in ?? ()
  #10 0x in ?? ()
  ```

  Based on grub_mm_dump, we can see the memory fragment (some parts seem
  likely be used because of 4K resolution?) and doesn’t have available
  contiguous memory for larger file as:

  ```
  grub_real_malloc(...)
  ...
  if (cur->size >= n + extra)
  ```

  Based on UEFI Specification Section 7.2[1] and UEFI driver writers’
  guide 4.2.3[2], we can ask 32bits+ on AllocatePages().

  As most X86_64 platforms should support 64 bits addressing, we should
  extend GRUB_EFI_MAX_USABLE_ADDRESS to 64 bits to get more available
  memory.

   * When users grown the initramfs, then probably will get initramfs
  not found which really annoyed and impact the user experience (system
  not able to boot).

  [Test Plan]

   * detailed instructions how to reproduce the bug:

  1. Any method to grow the initramfs, such as install nvidia-driver.

  2. If developers would like to reproduce, then could dd if=/dev/random
  of=... bs=1M count=500, something like:

  ```
  $ cat /usr/share/initramfs-tools/hooks/zzz-touch-a-file
  #!/bin/sh

  PREREQ=""

  prereqs()
  {
  echo "$PREREQ"
  }

  case $1 in
  # get pre-requisites
  prereqs)
  prereqs
  exit 0
  ;;
  esac

  . /usr/share/initramfs-tools/hook-functions
  dd if=/dev/random of=${DESTDIR}/test-500M bs=1M count=500
  ```

  And then update-initramfs

   * After applying my patches, the issue is gone.

   * I did also test my test grubx64.efi in:

  1. X86_64 qemu with
  1.1. 60M initramfs + 5.15.0-37-generic kernel
  1.2. 565M initramfs + 5.17.0-1011-oem kernel

  2. Amd64 HP mobile workstation with
  2.1. 65M initramfs + 5.15.0-39-generic kernel
  2.2. 771M initramfs + 5.17.0-1011-oem kernel

  All working well.

  [Where problems could occur]

  * The changes almost in i386/efi, thus the impact will be in the i386 / 
x86_64 EFI system.
  The other change is to modify the “grub-core/kern/efi/mm.c” but I use the 
original addressing for “arm/arm64/ia64/riscv32/riscv64”.
  Thus it should not impact them.

  * There is a “#if defined(__x86_64__)” which intent to limit the >
  32bits code in i386 system and also

  ```
   #if defined (__code_model_large__)
  -#define GRUB_EFI_MAX_USABLE_ADDRESS 0x
  +#define GRUB_EFI_MAX_USABLE_ADDRESS __UINTPTR_MAX__
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x7fff
   #else
   #define GRUB_EFI_MAX_USABLE_ADDRESS