Package: debian-installer Version: 20230427 Severity: important X-Debbugs-Cc: debian-...@lists.debian.org, debian-ker...@lists.debian.org, debia...@lists.debian.org
Hi everyone, I'm reaching out to all the aforementioned teams because I know nothing about UEFI, kernel-side DRM modules, or X drivers, and I'd like to get some feedback here. If you need a TL;DR, you can skip to “Proposal plan for d-i”, which is about my plans for the very next few hours, unless someone tells me the proposal is crazy, unsafe, etc. Backstory ========= Since we've been hitting and/or (re)discovering UEFI-specific issues lately (#1033913), I decided to spend some time extending my usual tests, traditionally run under QEMU with default settings, therefore booted under BIOS, to also run them under UEFI (meaning also testing Secure Boot without having to switch to baremetal). I've been kindly pointed by regular image testers to the following page: https://wiki.debian.org/SecureBoot/VirtualMachine But I was a little shocked to discover a broken X display when booting under UEFI! It seems I'm not the only one since that page has the following, even if there are no references to any bug reports: -vga virtio - The Debian installer seems to have difficulties working with the standard VGA driver (and virtio should anyway have better performance) The test setup is described at the very end of this report, with my current test target being specifically netboot/gtk/mini.iso for amd64. Kernel-side =========== The fb-modules udeb hasn't changed much since 4+ years, with some DRM modules getting added alongside existing ones, leading to the following contents in Bullseye (5.10.178-3): ./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko ./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/drm.ko ./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko ./lib/modules/5.10.0-22-amd64/kernel/drivers/media/cec/core/cec.ko ./lib/modules/5.10.0-22-amd64/kernel/drivers/video/fbdev/vga16fb.ko ./lib/modules/5.10.0-22-amd64/kernel/drivers/video/vgastate.ko ./lib/modules/5.10.0-22-amd64/kernel/drivers/virtio/virtio_dma_buf.ko and the following contents in Bookworm (6.1.27-1): ./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko ./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm.ko ./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm_shmem_helper.ko ./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko ./lib/modules/6.1.0-9-amd64/kernel/drivers/video/fbdev/vga16fb.ko ./lib/modules/6.1.0-9-amd64/kernel/drivers/video/vgastate.ko ./lib/modules/6.1.0-9-amd64/kernel/drivers/virtio/virtio_dma_buf.ko Those contents are defined via those files in linux.git: kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat debian/installer/modules/amd64/fb-modules #include <fb-modules> vesafb ? vga16fb kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat debian/installer/modules/fb-modules # We don't include all DRM drivers here as on many platforms we can # call system firmware to get hold of a simple framebuffer drm drm_kms_helper virtio-gpu ? X-side ====== Now, we know that the contents of xserver-xorg-core-udeb have changed a little between Bullseye and Bookworm (#1035014), but that doesn't seem to be a factor here. I've tested 3 netboot/gtk/mini.iso to assess the situation: - mini-20210731+deb11u8.iso from Bullseye 11.7 - mini-20230427.iso from D-I Bookworm RC 2 - mini-daily.iso from D-I daily builds (downloaded today) If people want to replicate those tests, they're available at: https://people.debian.org/~kibi/bug-drm-vs-uefi/ Or: wget https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/20210731+deb11u8/images/netboot/gtk/mini.iso -O mini-20210731+deb11u8.iso wget https://deb.debian.org/debian/dists/bookworm/main/installer-amd64/20230427/images/netboot/gtk/mini.iso -O mini-20230427.iso wget https://d-i.debian.org/daily-images/amd64/daily/netboot/gtk/mini.iso -O mini-daily.iso Via QEMU, under BIOS and UEFI, results are: +-------------+-----------------+-----------------+-----------------+ | Graphics | Bullseye 11.7 | Bookworm RC 2 | Daily builds | +-------------+--------+--------+--------+--------+--------+--------+ | | BIOS | UEFI | BIOS | UEFI | BIOS | UEFI | +-------------+--------+--------+--------+--------+--------+--------+ | | OK | OK | OK | KO-G | OK | KO-G | | -vga std | OK | OK | OK | KO-G | OK | KO-G | | -vga cirrus | OK | OK | OK | KO-S | OK | KO-S | | -vga qxl | OK | OK | OK | OK | OK | OK | | -vga virtio | OK | OK | OK | OK | OK | OK | | -vga vmware | OK | OK | OK | OK | OK | OK | +-------------+--------+--------+--------+--------+--------+--------+ Here, we see that the RC 2, that had xserver-xorg-code-udeb without modesetting_drv.so (#1035014) is actually performing exactly as the daily builds, where it's been added back. In the table: - no options and -vga std grouped together since that seemed to be the default, confirmed by identical test results; then other -vga sorted alphabetically. - KO-G is for garbled: https://people.debian.org/~kibi/bug-drm-vs-uefi/screenshot-std-garbled.png - KO-S is for split: https://people.debian.org/~kibi/bug-drm-vs-uefi/screenshot-cirrus-split.png X seems to work in both the garbled case and in the split case (bottom is the rest of the GRUB prompt, top is the actual GTK window), and one can navigate the menus using arrows, and also type “fre” to get to the French entry. I didn't go through a single full install though (even if that'd be definitely doable, a manual speedrun isn't unheard of…). I didn't try to extract any logs, but I can definitely do that for further investigation. My first instinct, as it happens quite a lot, was wondering whether we could be missing modules on the kernel side, that's why I started this report by listing the contents of the fb-modules udebs. Now, there are dedicated DRM modules for various hardware, including… bochs and cirrus, so I've tried including them in a mini.iso, which can also be found in the same directory: https://people.debian.org/~kibi/bug-drm-vs-uefi/mini-hackhackhack.iso Nasty code: https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa (Only tested with a manual netboot-gtk build on amd64.) Instead of “going the correct way” (meaning patching linux.git, then rebuilding linux-signed-amd64 to get an updated fb-modules udeb), I've investigated a nasty but apparently effective approach that could be used *if* we wanted to add those modules in RC 3. It's very nasty but doesn't depend on a new round of linux upload, lengthy builds (looking at you, mips*), manual steps for signing etc. And *if* we want to try that approach, I'd very much prefer doing that in RC 3, and either profit, or revert in RC 4… instead of only trying in RC 4, possibly breaking the graphical installer right when entering the “nobody move!” stage of the freeze. Note that I've “resolved” the module dependencies manually, and also included vboxvideo.ko along the way, which has the same dependencies. We've had some (unfortunately vague) reports from VirtualBox users, maybe they're hitting the same kind of issues… But at this point, this is really a shot in the dark (no pun intended — at least initially). At least for a friend of mine who was nice enough to run a few tests under VirtualBox, d-i seems to work fine, with or without the hack, on both Windows and Mac Intel hosts, so it doesn't appear to regress obviously… Questions ========= - Is it really to be expected that X and standard drivers would regress this way when moving from Bullseye to Bookworm? - Or is it expected to require specific kernel modules while that wasn't the case before? I've discovered this in VM environments, but maybe similar things could be happening on bare metal as well, and maybe some more modules should be considered for inclusion? - Is it acceptable to just bundle bochs, cirrus, and vboxvideo for the time being (i.e. RC 3, RC 4, 12.0.0), be it via the nasty approach or via a proper linux fb-modules inclusion? - Or does shipping those few modules risk breaking the kernel and/or X on other platforms? (I'd definitely hope not!) - Should I extract some dmesg/X logs from the KO-G/KO-S cases, so that one has a chance of understanding what's happening? Since it's likely to be a little annoying, I'd be happy to take a full list of cells in the big matrix for which it would make sense to have logs. Another reason why I haven't started there is that I don't expect us to find it reasonable to hotpatch the X server at this very late stage of the freeze, if that was deemed to be a problem in X. Adding some specific kernel modules seems much more targeted and way less risky… (even if that might just be a workaround and not a long-term fix). Wild guess ========== One obvious difference between BIOS and UEFI booting is the bootloader, ISOLINUX vs. GRUB. It might be that the latter leaves the graphics stack in a particular state that no longer pleases the default things in the kernel and/or X, while that wasn't an issue in Bullseye? Proposal plan for d-i (Bookworm RC 3, RC 4, and 12.0.0) ===================== Unless I received strong negative feedback before Monday (May 15th), I plan on including the nasty approach in RC 3, and to revert it altogether in RC 4 if big bad regressions are reported: https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa As a side note, keeping the bundling in src:debian-installer for the next few weeks makes us autonomous: we can enable and disable those extra modules without requiring a new linux upload… so it's nasty but I actually thought about the few advantages we were getting out of this! We should also be OK legal-wise, given we already have linux in Built-Using via its udebs, so copying things around from linux-image wouldn't change anything there, would it? Of course in the long run, if having those modules is desired, it will be better to have them merged in linux and to drop the nasty code, e.g. in a point release. Test reproducibility ==================== All tests were performed on an amd64 Bullseye host, with a Bullseye set of qemu packages. I've installed ovmf from Bookworm though, as enabling UEFI support was preventing me from being able to boot directly from the ISO, and would mean going through the UEFI menu to select the boot disk every single time. The big matrix above was built with that Bookworm ovmf package, and unless someone insists I should redo all the tests with the Bullseye one, I don't plan on spending time on this. - BIOS: kvm -m 1G -cdrom mini-<TEST>.iso [-vga <GRAPHICS>] - UEFI: cp /usr/share/OVMF/OVMF_CODE_4M.ms.fd /tmp/code.fd cp /usr/share/OVMF/OVMF_VARS_4M.ms.fd /tmp/vars.fd kvm -m 1G -machine q35,smm=on -pflash /tmp/code.fd -pflash /tmp/vars.fd -cdrom mini-<TEST>.iso [-vga <GRAPHICS>] (q35,smm=on satisfies Secure Boot related hardware requirements.) Thanks for your time and your feedback. Hopefully this is my very last overlong report for this release cycle… Once again, I thought I'd err on the side of exhaustiveness. I might still follow up with some more test results from earlier D-I Bookworm releases (Alpha 1, Alpha 2, RC 1) which might help narrow down what changed between Bullseye and (current) Bookworm. But that might happen after RC 3 is published. Cheers, -- Cyril Brulebois (k...@debian.org) <https://debamax.com/> D-I release manager -- Release team member -- Freelance Consultant