Hey Jeremy, Understood; thanks for clarifying!
I thought that maybe having upstream input could be helpful (as Julian mentioned to avoid diverging further), but if it is so much different nowadays, as you mentioned, it likely won't be that helpful anyway. By the way, appreciate that "newbie" statement.. after all this work/analysis/patches, I couldn't think that of you :) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1842320 Title: Can't boot: "error: out of memory." immediately after the grub menu Status in grub: Unknown Status in OEM Priority Project: Triaged Status in grub2-signed package in Ubuntu: Confirmed Status in initramfs-tools package in Ubuntu: Confirmed Status in linux package in Ubuntu: Confirmed Bug description: [Impact] * In some cases, if the users’ initramfs grow bigger, then it’ll likely not be able to be loaded by grub2. * Some real cases from OEM projects: In many built-in 4k monitor laptops with nvidia drivers, the u-d-c puts the nvidia*.ko to initramfs which grows the initramfs to ~120M. Also the gfxpayload=auto will remain to use 4K resolution since it’s what EFI POST passed. In this case, the grub isn't able to load initramfs because the grub_memalign() won't be able to get suitable memory for the larger file: ``` #0 grub_memalign (align=1, size=592214020) at ../../../grub-core/kern/mm.c:376 #1 0x000000007dd7b074 in grub_malloc (size=592214020) at ../../../grub-core/kern/mm.c:408 #2 0x000000007dd7a2c8 in grub_verifiers_open (io=0x7bc02d80, type=131076) at ../../../grub-core/kern/verifiers.c:150 #3 0x000000007dd801d4 in grub_file_open (name=0x7bc02f00 "/boot/initrd.img-5.17.0-1011-oem", type=131076) at ../../../grub-core/kern/file.c:121 #4 0x000000007bcd5a30 in ?? () #5 0x000000007fe21247 in ?? () #6 0x000000007bc030c8 in ?? () #7 0x000000017fe21238 in ?? () #8 0x000000007bcd5320 in ?? () #9 0x000000007fe21250 in ?? () #10 0x0000000000000000 in ?? () ``` Based on grub_mm_dump, we can see the memory fragment (some parts seem likely be used because of 4K resolution?) and doesn’t have available contiguous memory for larger file as: ``` grub_real_malloc(...) ... if (cur->size >= n + extra) ``` Based on UEFI Specification Section 7.2[1] and UEFI driver writers’ guide 4.2.3[2], we can ask 32bits+ on AllocatePages(). As most X86_64 platforms should support 64 bits addressing, we should extend GRUB_EFI_MAX_USABLE_ADDRESS to 64 bits to get more available memory. * When users grown the initramfs, then probably will get initramfs not found which really annoyed and impact the user experience (system not able to boot). [Test Plan] * detailed instructions how to reproduce the bug: 1. Any method to grow the initramfs, such as install nvidia-driver. 2. If developers would like to reproduce, then could dd if=/dev/random of=... bs=1M count=500, something like: ``` $ cat /usr/share/initramfs-tools/hooks/zzz-touch-a-file #!/bin/sh PREREQ="" prereqs() { echo "$PREREQ" } case $1 in # get pre-requisites prereqs) prereqs exit 0 ;; esac . /usr/share/initramfs-tools/hook-functions dd if=/dev/random of=${DESTDIR}/test-500M bs=1M count=500 ``` And then update-initramfs * After applying my patches, the issue is gone. * I did also test my test grubx64.efi in: 1. X86_64 qemu with 1.1. 60M initramfs + 5.15.0-37-generic kernel 1.2. 565M initramfs + 5.17.0-1011-oem kernel 2. Amd64 HP mobile workstation with 2.1. 65M initramfs + 5.15.0-39-generic kernel 2.2. 771M initramfs + 5.17.0-1011-oem kernel All working well. [Where problems could occur] * The changes almost in i386/efi, thus the impact will be in the i386 / x86_64 EFI system. The other change is to modify the “grub-core/kern/efi/mm.c” but I use the original addressing for “arm/arm64/ia64/riscv32/riscv64”. Thus it should not impact them. * There is a “#if defined(__x86_64__)” which intent to limit the > 32bits code in i386 system and also ``` #if defined (__code_model_large__) -#define GRUB_EFI_MAX_USABLE_ADDRESS 0xffffffff +#define GRUB_EFI_MAX_USABLE_ADDRESS __UINTPTR_MAX__ +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x7fffffff #else #define GRUB_EFI_MAX_USABLE_ADDRESS 0x7fffffff +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x3fffffff #endif ``` If everything works as expected, then i386 should working good. If not lucky, based on “UEFI writers’ guide”[2], the i386 will get > 4GB memory region and never be able to access. [Other Info] * Upstream grub2 bug #61058 https://savannah.gnu.org/bugs/index.php?61058 * Test PPA: https://launchpad.net/~os369510/+archive/ubuntu/lp1842320 * Test grubx64.efi: https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320 * Test source code: https://github.com/os369510/grub2/tree/lp1842320 * If you built the package, then test grubx64.efi is under “obj/monolithic/grub-efi-amd64/grubx64.efi”, in my case: `/var/cache/pbuilder/build/276481/build/grub2-2.06/obj/monolithic/grub- efi-amd64/grubx64.efi` * My build command: `sudo PBSHELL=1 pbuilder build --hookdir ~/hook- dir ubuntu-grub/grub2_2.06-2ubuntu7+jeremydev2.dsc 2>&1 | tee build.log` * My qemu command: `qemu-system-x86_64 -bios edk2/Build/OvmfX64/DEBUG_GCC5/FV/OVMF.fd -hda Templates/grub.qcow2 -m 6G -vga cirrus -smp 8 -machine type=q35,accel=kvm -cpu host -enable- kvm -boot menu=on` (I built an edk2 binary with debugging log) * You can use my grubx64.efi with debug symbols from https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320-dev- with-debug-symbols and source code is from https://github.com/os369510/grub2/tree/jeremy-dev . After built the package from source code, then you can use gdb to attach the qemu session as: ``` ubuntu@ubuntu-HP-ZBook-Fury-16-G9-Mobile-Workstation-PC [ /var/cache/pbuilder/build/35354/tmp/buildd/grub2-2.06/obj/grub-efi-amd64/grub-core ] $ gdb -x gdb_grub # with “add-symbol-file kernel.img ${address} ``` The address above can read from qemu serial port and found the last “Loading driver at 0x000xxxxxxxxxx EntryPoint=0x000xxxxxxxabc” In above case, fill “0x000xxxxxxxabc” to ${address}. [1] https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf [2] https://edk2-docs.gitbook.io/edk-ii-uefi-driver-writer-s-guide/4_general_driver_design_guidelines/readme.2/423_use_uefi_memory_allocation_services --- Upgraded from 19.04 to current 19.10 using "do-release-upgrade -d". Can still boot using the previous 5.0.0-25-generic kernel, but the 5.2.0-15-generic fails to start. On selecting Ubuntu from Grub, the message "error: out of memory." is immediately shown. Pressing a key attempts to start boot-up but fails to mount root fs. Machine is HP Spectre X360 with 8GB RAM. Under kernel 5.0.0, free shows the following (run from Gnome terminal): total used free shared buff/cache available Mem: 7906564 1761196 3833240 1020216 2312128 4849224 Swap: 1003516 0 1003516 Kernel packages installed: linux-generic 5.2.0.15.16 amd64 linux-headers-5.2.0-15 5.2.0-15.16 all linux-headers-5.2.0-15-generic 5.2.0-15.16 amd64 linux-headers-generic 5.2.0.15.16 amd64 linux-image-5.0.0-25-generic 5.0.0-25.26 amd64 linux-image-5.2.0-15-generic 5.2.0-15.16+signed1 amd64 linux-image-generic 5.2.0.15.16 amd64 linux-modules-5.0.0-25-generic 5.0.0-25.26 amd64 linux-modules-5.2.0-15-generic 5.2.0-15.16 amd64 linux-modules-extra-5.0.0-25-generic 5.0.0-25.26 amd64 linux-modules-extra-5.2.0-15-generic 5.2.0-15.16 amd64 Photo of kernel panic attached. NVMe drive partition layout (GPT): Device Start End Sectors Size Type /dev/nvme0n1p1 2048 1050623 1048576 512M EFI System /dev/nvme0n1p2 1050624 2549759 1499136 732M Linux filesystem /dev/nvme0n1p3 2549760 1000214527 997664768 475.7G Linux filesystem $ sudo pvs PV VG Fmt Attr PSize PFree /dev/mapper/nvme0n1p3_crypt ubuntu-vg lvm2 a-- <475.71g 0 $ sudo lvs LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert root ubuntu-vg -wi-ao---- 474.75g swap_1 ubuntu-vg -wi-ao---- 980.00m Partition 3 is LUKS encrypted. Root LV is ext4. --- ProblemType: Bug ApportVersion: 2.20.11-0ubuntu7 Architecture: amd64 AudioDevicesInUse: USER PID ACCESS COMMAND /dev/snd/controlC0: gmckeown 1647 F.... pulseaudio CurrentDesktop: ubuntu:GNOME DistroRelease: Ubuntu 19.10 InstallationDate: Installed on 2019-08-15 (18 days ago) InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Release amd64 (20190416) Lsusb: Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 001 Device 003: ID 8087:0a2b Intel Corp. Bus 001 Device 002: ID 04f2:b593 Chicony Electronics Co., Ltd HP Wide Vision FHD Camera Bus 001 Device 004: ID 046d:c52b Logitech, Inc. Unifying Receiver Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub MachineType: HP HP Spectre x360 Convertible 13-ae0xx Package: linux (not installed) ProcFB: 0 inteldrmfb ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.0.0-25-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash ProcVersionSignature: Ubuntu 5.0.0-25.26-generic 5.0.18 RelatedPackageVersions: linux-restricted-modules-5.0.0-25-generic N/A linux-backports-modules-5.0.0-25-generic N/A linux-firmware 1.181 Tags: eoan Uname: Linux 5.0.0-25-generic x86_64 UpgradeStatus: Upgraded to eoan on 2019-09-02 (0 days ago) UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo _MarkForUpload: True dmi.bios.date: 05/17/2019 dmi.bios.vendor: AMI dmi.bios.version: F.25 dmi.board.asset.tag: Base Board Asset Tag dmi.board.name: 83B9 dmi.board.vendor: HP dmi.board.version: 56.43 dmi.chassis.type: 31 dmi.chassis.vendor: HP dmi.chassis.version: Chassis Version dmi.modalias: dmi:bvnAMI:bvrF.25:bd05/17/2019:svnHP:pnHPSpectrex360Convertible13-ae0xx:pvr:rvnHP:rn83B9:rvr56.43:cvnHP:ct31:cvrChassisVersion: dmi.product.family: 103C_5335KV HP Spectre dmi.product.name: HP Spectre x360 Convertible 13-ae0xx dmi.product.sku: 2QH38EA#ABU dmi.sys.vendor: HP To manage notifications about this bug go to: https://bugs.launchpad.net/grub/+bug/1842320/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp