Package: debian-installer
Version: 20230427
Severity: important
X-Debbugs-Cc: debian-...@lists.debian.org, debian-ker...@lists.debian.org, 
debia...@lists.debian.org

Hi everyone,

I'm reaching out to all the aforementioned teams because I know nothing
about UEFI, kernel-side DRM modules, or X drivers, and I'd like to get
some feedback here.

If you need a TL;DR, you can skip to “Proposal plan for d-i”, which is
about my plans for the very next few hours, unless someone tells me the
proposal is crazy, unsafe, etc.


Backstory
=========

Since we've been hitting and/or (re)discovering UEFI-specific issues
lately (#1033913), I decided to spend some time extending my usual
tests, traditionally run under QEMU with default settings, therefore
booted under BIOS, to also run them under UEFI (meaning also testing
Secure Boot without having to switch to baremetal).

I've been kindly pointed by regular image testers to the following page:
  https://wiki.debian.org/SecureBoot/VirtualMachine

But I was a little shocked to discover a broken X display when booting
under UEFI! It seems I'm not the only one since that page has the
following, even if there are no references to any bug reports:

    -vga virtio - The Debian installer seems to have difficulties
                  working with the standard VGA driver (and virtio
                  should anyway have better performance) 

The test setup is described at the very end of this report, with my
current test target being specifically netboot/gtk/mini.iso for amd64.


Kernel-side
===========

The fb-modules udeb hasn't changed much since 4+ years, with some DRM
modules getting added alongside existing ones, leading to the following
contents in Bullseye (5.10.178-3):

    ./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko
    ./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/drm.ko
    ./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko
    ./lib/modules/5.10.0-22-amd64/kernel/drivers/media/cec/core/cec.ko
    ./lib/modules/5.10.0-22-amd64/kernel/drivers/video/fbdev/vga16fb.ko
    ./lib/modules/5.10.0-22-amd64/kernel/drivers/video/vgastate.ko
    ./lib/modules/5.10.0-22-amd64/kernel/drivers/virtio/virtio_dma_buf.ko

and the following contents in Bookworm (6.1.27-1):

    ./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko
    ./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm.ko
    ./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm_shmem_helper.ko
    ./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko
    ./lib/modules/6.1.0-9-amd64/kernel/drivers/video/fbdev/vga16fb.ko
    ./lib/modules/6.1.0-9-amd64/kernel/drivers/video/vgastate.ko
    ./lib/modules/6.1.0-9-amd64/kernel/drivers/virtio/virtio_dma_buf.ko

Those contents are defined via those files in linux.git:

    kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat 
debian/installer/modules/amd64/fb-modules
    #include <fb-modules>
    
    vesafb ?
    vga16fb

    kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat 
debian/installer/modules/fb-modules
    # We don't include all DRM drivers here as on many platforms we can
    # call system firmware to get hold of a simple framebuffer
    
    drm
    drm_kms_helper
    virtio-gpu ?


X-side
======

Now, we know that the contents of xserver-xorg-core-udeb have changed a
little between Bullseye and Bookworm (#1035014), but that doesn't seem
to be a factor here.

I've tested 3 netboot/gtk/mini.iso to assess the situation:

 - mini-20210731+deb11u8.iso from Bullseye 11.7
 - mini-20230427.iso         from D-I Bookworm RC 2
 - mini-daily.iso            from D-I daily builds (downloaded today)

If people want to replicate those tests, they're available at:
  https://people.debian.org/~kibi/bug-drm-vs-uefi/

Or:

    wget 
https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/20210731+deb11u8/images/netboot/gtk/mini.iso
 -O mini-20210731+deb11u8.iso
    wget 
https://deb.debian.org/debian/dists/bookworm/main/installer-amd64/20230427/images/netboot/gtk/mini.iso
 -O mini-20230427.iso
    wget https://d-i.debian.org/daily-images/amd64/daily/netboot/gtk/mini.iso 
-O mini-daily.iso


Via QEMU, under BIOS and UEFI, results are:

  +-------------+-----------------+-----------------+-----------------+
  |  Graphics   |  Bullseye 11.7  |  Bookworm RC 2  |  Daily builds   |
  +-------------+--------+--------+--------+--------+--------+--------+
  |             |  BIOS  |  UEFI  |  BIOS  |  UEFI  |  BIOS  |  UEFI  |
  +-------------+--------+--------+--------+--------+--------+--------+
  |             |   OK   |   OK   |   OK   |  KO-G  |   OK   |  KO-G  |
  | -vga std    |   OK   |   OK   |   OK   |  KO-G  |   OK   |  KO-G  |
  | -vga cirrus |   OK   |   OK   |   OK   |  KO-S  |   OK   |  KO-S  |
  | -vga qxl    |   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
  | -vga virtio |   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
  | -vga vmware |   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
  +-------------+--------+--------+--------+--------+--------+--------+

Here, we see that the RC 2, that had xserver-xorg-code-udeb without
modesetting_drv.so (#1035014) is actually performing exactly as the
daily builds, where it's been added back.

In the table:
 - no options and -vga std grouped together since that seemed to be the
   default, confirmed by identical test results; then other -vga sorted
   alphabetically.
 - KO-G is for garbled: 
https://people.debian.org/~kibi/bug-drm-vs-uefi/screenshot-std-garbled.png
 - KO-S is for split: 
https://people.debian.org/~kibi/bug-drm-vs-uefi/screenshot-cirrus-split.png

X seems to work in both the garbled case and in the split case (bottom
is the rest of the GRUB prompt, top is the actual GTK window), and one
can navigate the menus using arrows, and also type “fre” to get to the
French entry. I didn't go through a single full install though (even if
that'd be definitely doable, a manual speedrun isn't unheard of…).

I didn't try to extract any logs, but I can definitely do that for
further investigation. My first instinct, as it happens quite a lot, was
wondering whether we could be missing modules on the kernel side, that's
why I started this report by listing the contents of the fb-modules udebs.

Now, there are dedicated DRM modules for various hardware, including…
bochs and cirrus, so I've tried including them in a mini.iso, which can
also be found in the same directory:
  https://people.debian.org/~kibi/bug-drm-vs-uefi/mini-hackhackhack.iso

Nasty code:
  
https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa

(Only tested with a manual netboot-gtk build on amd64.)

Instead of “going the correct way” (meaning patching linux.git, then
rebuilding linux-signed-amd64 to get an updated fb-modules udeb), I've
investigated a nasty but apparently effective approach that could be
used *if* we wanted to add those modules in RC 3. It's very nasty but
doesn't depend on a new round of linux upload, lengthy builds (looking
at you, mips*), manual steps for signing etc. And *if* we want to try
that approach, I'd very much prefer doing that in RC 3, and either
profit, or revert in RC 4… instead of only trying in RC 4, possibly
breaking the graphical installer right when entering the “nobody move!”
stage of the freeze.

Note that I've “resolved” the module dependencies manually, and also
included vboxvideo.ko along the way, which has the same dependencies.
We've had some (unfortunately vague) reports from VirtualBox users,
maybe they're hitting the same kind of issues… But at this point, this
is really a shot in the dark (no pun intended — at least initially).

At least for a friend of mine who was nice enough to run a few tests
under VirtualBox, d-i seems to work fine, with or without the hack, on
both Windows and Mac Intel hosts, so it doesn't appear to regress
obviously…


Questions
=========

 - Is it really to be expected that X and standard drivers would regress
   this way when moving from Bullseye to Bookworm?
 - Or is it expected to require specific kernel modules while that wasn't
   the case before? I've discovered this in VM environments, but maybe
   similar things could be happening on bare metal as well, and maybe
   some more modules should be considered for inclusion?
 - Is it acceptable to just bundle bochs, cirrus, and vboxvideo for the
   time being (i.e. RC 3, RC 4, 12.0.0), be it via the nasty approach
   or via a proper linux fb-modules inclusion?
 - Or does shipping those few modules risk breaking the kernel and/or X
   on other platforms? (I'd definitely hope not!)
 - Should I extract some dmesg/X logs from the KO-G/KO-S cases, so that
   one has a chance of understanding what's happening? Since it's likely
   to be a little annoying, I'd be happy to take a full list of cells in
   the big matrix for which it would make sense to have logs. Another
   reason why I haven't started there is that I don't expect us to find
   it reasonable to hotpatch the X server at this very late stage of the
   freeze, if that was deemed to be a problem in X. Adding some specific
   kernel modules seems much more targeted and way less risky… (even if
   that might just be a workaround and not a long-term fix).


Wild guess
==========

One obvious difference between BIOS and UEFI booting is the bootloader,
ISOLINUX vs. GRUB. It might be that the latter leaves the graphics stack
in a particular state that no longer pleases the default things in
the kernel and/or X, while that wasn't an issue in Bullseye?


Proposal plan for d-i (Bookworm RC 3, RC 4, and 12.0.0)
=====================

Unless I received strong negative feedback before Monday (May 15th),
I plan on including the nasty approach in RC 3, and to revert it
altogether in RC 4 if big bad regressions are reported:
  
https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa

As a side note, keeping the bundling in src:debian-installer for the
next few weeks makes us autonomous: we can enable and disable those
extra modules without requiring a new linux upload… so it's nasty but I
actually thought about the few advantages we were getting out of this!

We should also be OK legal-wise, given we already have linux in
Built-Using via its udebs, so copying things around from linux-image
wouldn't change anything there, would it?

Of course in the long run, if having those modules is desired, it will
be better to have them merged in linux and to drop the nasty code, e.g.
in a point release.


Test reproducibility
====================

All tests were performed on an amd64 Bullseye host, with a Bullseye set
of qemu packages. I've installed ovmf from Bookworm though, as enabling
UEFI support was preventing me from being able to boot directly from the
ISO, and would mean going through the UEFI menu to select the boot disk
every single time. The big matrix above was built with that Bookworm
ovmf package, and unless someone insists I should redo all the tests
with the Bullseye one, I don't plan on spending time on this.

 - BIOS:

     kvm -m 1G -cdrom mini-<TEST>.iso [-vga <GRAPHICS>]

 - UEFI:

     cp /usr/share/OVMF/OVMF_CODE_4M.ms.fd /tmp/code.fd 
     cp /usr/share/OVMF/OVMF_VARS_4M.ms.fd /tmp/vars.fd
     kvm -m 1G -machine q35,smm=on -pflash /tmp/code.fd -pflash /tmp/vars.fd 
-cdrom mini-<TEST>.iso [-vga <GRAPHICS>]

(q35,smm=on satisfies Secure Boot related hardware requirements.)



Thanks for your time and your feedback. Hopefully this is my very last
overlong report for this release cycle… Once again, I thought I'd err on
the side of exhaustiveness.

I might still follow up with some more test results from earlier D-I
Bookworm releases (Alpha 1, Alpha 2, RC 1) which might help narrow down
what changed between Bullseye and (current) Bookworm. But that might
happen after RC 3 is published.


Cheers,
-- 
Cyril Brulebois (k...@debian.org)            <https://debamax.com/>
D-I release manager -- Release team member -- Freelance Consultant

Reply via email to