Re: Bug#1036019: debian-installer: Broken X display with QEMU under UEFI with cirrus and std graphics

2023-05-13 Thread Cyril Brulebois
Hi Ben,

Thanks for all those details!

Ben Hutchings  (2023-05-14):
> > 
> >   +-+-+-+-+
> >   |  Graphics   |  Bullseye 11.7  |  Bookworm RC 2  |  Daily builds   |
> >   +-+++++++
> >   | |  BIOS  |  UEFI  |  BIOS  |  UEFI  |  BIOS  |  UEFI  |
> >   +-+++++++
> >   | |   OK   |   OK   |   OK   |  KO-G  |   OK   |  KO-G  |
> >   | -vga std|   OK   |   OK   |   OK   |  KO-G  |   OK   |  KO-G  |
> >   | -vga cirrus |   OK   |   OK   |   OK   |  KO-S  |   OK   |  KO-S  |
> >   | -vga qxl|   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
> >   | -vga virtio |   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
> >   | -vga vmware |   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
> >   +-+++++++
> 
> I started testing with QEMU and OVMF from unstable, and I'm instead
> seeing Xorg failing to start in the same cases you see glitches.  The
> relevant error message seems to be this one:
> http://codesearch.debian.net/show?file=xorg-server_2%3A21.1.7-3%2Fhw%2Fxfree86%2Ffbdevhw%2Ffbdevhw.c&line=504

Checking RC 1, I'm seeing OK results for both `-vga std` (or no options)
and `-vga cirrus`. I should note GRUB itself is “text-like” with RC 1,
while it's “graphical” with RC 2.

Reverting the following commit in debian-installer.git and building a
netboot-gtk image against unstable gives me a working graphical
installer with `-vga std` (or no options) and `-vga cirrus`. I didn't
check the rest of the matrix though.
  
https://salsa.debian.org/installer-team/debian-installer/-/commit/a4dc8c0fe7ad1a0c1506125ad9985f78819a1bb2

So it looks to me the GRUB config fix uncovered a pre-existing bug, and
the linux version bump (6.1.20-1 → 6.1.20-2) between RC 1 and RC 2 isn't
a factor (xserver-xorg-* udebs didn't change).

Interestingly, switching to the bullseye branch and cherry-picking the
same GRUB config fix there, and rebuilding d-i against current bullseye,
I'm getting exactly the same problem: KO-G for std, KO-S for cirrus!

So it looks like this might be a rather old issue, rather than a
regression during the Bookworm release cycle.


Also, I should note that while my focus was on netboot-gtk mini.iso
(because it's much quicker to rebuild/tweak than a netinst image), I'm
replicating those results with the netinst images:
 - Bullseye has a “text-like” GRUB, all good.
 - Bookworm RC 1 has a “text-like” GRUB, all good.
 - Bookworm RC 2 has a “graphical” GRUB, issues!

> > Questions
> > =
> > 
> >  - Is it really to be expected that X and standard drivers would regress
> >this way when moving from Bullseye to Bookworm?
> 
> No.
> 
> >  - Or is it expected to require specific kernel modules while that wasn't
> >the case before? I've discovered this in VM environments, but maybe
> >similar things could be happening on bare metal as well, and maybe
> >some more modules should be considered for inclusion?
> 
> No.
> 
> >  - Is it acceptable to just bundle bochs, cirrus, and vboxvideo for the
> >time being (i.e. RC 3, RC 4, 12.0.0), be it via the nasty approach
> >or via a proper linux fb-modules inclusion?
> >  - Or does shipping those few modules risk breaking the kernel and/or X
> >on other platforms? (I'd definitely hope not!)
> 
> I would not expect so.  They get used on the installed system, so they
> probably work.

Copy all!

Note for a further session: instead of debugging d-i itself, it should
be possible to reproduce those issues in the installed system, by
keeping only a specific list of kernel modules and X drivers. Of course,
that means having GRUB in “graphical” mode as well (a quick check
suggests installing desktop-base, without plymouth*, is sufficient for
that part).

As a very quick experiment, I tried:
 - installing xfce4 and desktop-base;
 - rebooting;
 - X doesn't start directly, one needs to run startxfce4 from the
   console.

Then:
 - manually removing all X drivers except fbdev_drv.so;
 - manually removing both tiny/ drivers (bochs and cirrus);
 - rebuilding the initramfs;
 - rebooting.

This gives me the following:
 - std: black screen, not even seeing a console prompt;
 - cirrus: “garbled/split” screen symptoms in the console, and in X;
 - qxl: all good in the console and in X.

Interestingly, purging desktop-base gets me back to a “text-only” GRUB
prompt, but both std and cirrus are exhibiting “garbled/split” screen
symptoms in the console and in X.

I'll stop here, I just wanted to confirm one could reproduce those
issues within the installed system, which should almost always be a
debug-friendlier environment than d-i…

> > Proposal plan for d-i (Bookworm RC 3, RC 4, and 12.0.0)
> > =
> > 
> > Unless I received strong negative feedback before Monday (May 15th),
>

Re: Bug#1036019: debian-installer: Broken X display with QEMU under UEFI with cirrus and std graphics

2023-05-13 Thread Ben Hutchings
On Sat, 2023-05-13 at 10:22 +0200, Cyril Brulebois wrote:
[...]
> Kernel-side
> ===
> 
> The fb-modules udeb hasn't changed much since 4+ years, with some DRM
> modules getting added alongside existing ones, leading to the following
> contents in Bullseye (5.10.178-3):
[...]
> Those contents are defined via those files in linux.git:
> 
> kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat 
> debian/installer/modules/amd64/fb-modules
> #include 
> 
> vesafb ?
> vga16fb
> 
> kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat 
> debian/installer/modules/fb-modules
> # We don't include all DRM drivers here as on many platforms we can
> # call system firmware to get hold of a simple framebuffer

To expand on this comment, in the case of UEFI boot the efifb driver
should provide a simple framebuffer, and on BIOS vesafb should do it. 
Those are both built-in on x86, and efifb is also built-in on arm64 and
armhf.


[...]
> X-side
> ==

Both of the kernel drivers are old-style framebuffer drivers so in
Xorg, the appropriate generic driver is "fbdev", not "modesetting".

> Now, we know that the contents of xserver-xorg-core-udeb have changed a
> little between Bullseye and Bookworm (#1035014), but that doesn't seem
> to be a factor here.
> 
> I've tested 3 netboot/gtk/mini.iso to assess the situation:
> 
>  - mini-20210731+deb11u8.iso from Bullseye 11.7
>  - mini-20230427.iso from D-I Bookworm RC 2
>  - mini-daily.isofrom D-I daily builds (downloaded today)
> 
> If people want to replicate those tests, they're available at:
>   https://people.debian.org/~kibi/bug-drm-vs-uefi/
> 
> Or:
> 
> wget 
> https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/20210731+deb11u8/images/netboot/gtk/mini.iso
>  -O mini-20210731+deb11u8.iso
> wget 
> https://deb.debian.org/debian/dists/bookworm/main/installer-amd64/20230427/images/netboot/gtk/mini.iso
>  -O mini-20230427.iso
> wget https://d-i.debian.org/daily-images/amd64/daily/netboot/gtk/mini.iso 
> -O mini-daily.iso

These all include fbdev_drv.so, and Xorg.log shows that the fbdev
driver is being used.

So I suppose there's a regression in either efifb or fbdev_drv.

> Via QEMU, under BIOS and UEFI, results are:
> 
>   +-+-+-+-+
>   |  Graphics   |  Bullseye 11.7  |  Bookworm RC 2  |  Daily builds   |
>   +-+++++++
>   | |  BIOS  |  UEFI  |  BIOS  |  UEFI  |  BIOS  |  UEFI  |
>   +-+++++++
>   | |   OK   |   OK   |   OK   |  KO-G  |   OK   |  KO-G  |
>   | -vga std|   OK   |   OK   |   OK   |  KO-G  |   OK   |  KO-G  |
>   | -vga cirrus |   OK   |   OK   |   OK   |  KO-S  |   OK   |  KO-S  |
>   | -vga qxl|   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
>   | -vga virtio |   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
>   | -vga vmware |   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
>   +-+++++++

I started testing with QEMU and OVMF from unstable, and I'm instead
seeing Xorg failing to start in the same cases you see glitches.  The
relevant error message seems to be this one:
http://codesearch.debian.net/show?file=xorg-server_2%3A21.1.7-3%2Fhw%2Fxfree86%2Ffbdevhw%2Ffbdevhw.c&line=504

[...]
> Questions
> =
> 
>  - Is it really to be expected that X and standard drivers would regress
>this way when moving from Bullseye to Bookworm?

No.

>  - Or is it expected to require specific kernel modules while that wasn't
>the case before? I've discovered this in VM environments, but maybe
>similar things could be happening on bare metal as well, and maybe
>some more modules should be considered for inclusion?

No.

>  - Is it acceptable to just bundle bochs, cirrus, and vboxvideo for the
>time being (i.e. RC 3, RC 4, 12.0.0), be it via the nasty approach
>or via a proper linux fb-modules inclusion?
>  - Or does shipping those few modules risk breaking the kernel and/or X
>on other platforms? (I'd definitely hope not!)

I would not expect so.  They get used on the installed system, so they
probably work.



[...]
> Proposal plan for d-i (Bookworm RC 3, RC 4, and 12.0.0)
> =
> 
> Unless I received strong negative feedback before Monday (May 15th),
> I plan on including the nasty approach in RC 3, and to revert it
> altogether in RC 4 if big bad regressions are reported:
>   
> https://salsa.debian.org/installer-team/debian-installer/-/commit/9fceca63273d0b501ea64d7b719acafc93a5b7fa
> 
> As a side note, keeping the bundling in src:debian-installer for the
> next few weeks makes us autonomous: we can enable and disable those
> extra modules without requiring a new linux upload… so it's nasty but I
> actually thought about the few advantages we were getting out of

Bug#1036019: debian-installer: Broken X display with QEMU under UEFI with cirrus and std graphics

2023-05-13 Thread Cyril Brulebois
Package: debian-installer
Version: 20230427
Severity: important
X-Debbugs-Cc: debian-...@lists.debian.org, debian-ker...@lists.debian.org, 
debian-x@lists.debian.org

Hi everyone,

I'm reaching out to all the aforementioned teams because I know nothing
about UEFI, kernel-side DRM modules, or X drivers, and I'd like to get
some feedback here.

If you need a TL;DR, you can skip to “Proposal plan for d-i”, which is
about my plans for the very next few hours, unless someone tells me the
proposal is crazy, unsafe, etc.


Backstory
=

Since we've been hitting and/or (re)discovering UEFI-specific issues
lately (#1033913), I decided to spend some time extending my usual
tests, traditionally run under QEMU with default settings, therefore
booted under BIOS, to also run them under UEFI (meaning also testing
Secure Boot without having to switch to baremetal).

I've been kindly pointed by regular image testers to the following page:
  https://wiki.debian.org/SecureBoot/VirtualMachine

But I was a little shocked to discover a broken X display when booting
under UEFI! It seems I'm not the only one since that page has the
following, even if there are no references to any bug reports:

-vga virtio - The Debian installer seems to have difficulties
  working with the standard VGA driver (and virtio
  should anyway have better performance) 

The test setup is described at the very end of this report, with my
current test target being specifically netboot/gtk/mini.iso for amd64.


Kernel-side
===

The fb-modules udeb hasn't changed much since 4+ years, with some DRM
modules getting added alongside existing ones, leading to the following
contents in Bullseye (5.10.178-3):

./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/drm.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/media/cec/core/cec.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/video/fbdev/vga16fb.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/video/vgastate.ko
./lib/modules/5.10.0-22-amd64/kernel/drivers/virtio/virtio_dma_buf.ko

and the following contents in Bookworm (6.1.27-1):

./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm_kms_helper.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/drm_shmem_helper.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/gpu/drm/virtio/virtio-gpu.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/video/fbdev/vga16fb.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/video/vgastate.ko
./lib/modules/6.1.0-9-amd64/kernel/drivers/virtio/virtio_dma_buf.ko

Those contents are defined via those files in linux.git:

kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat 
debian/installer/modules/amd64/fb-modules
#include 

vesafb ?
vga16fb

kibi@tokyo:~/debian-kernel/linux.git (sid=)$ cat 
debian/installer/modules/fb-modules
# We don't include all DRM drivers here as on many platforms we can
# call system firmware to get hold of a simple framebuffer

drm
drm_kms_helper
virtio-gpu ?


X-side
==

Now, we know that the contents of xserver-xorg-core-udeb have changed a
little between Bullseye and Bookworm (#1035014), but that doesn't seem
to be a factor here.

I've tested 3 netboot/gtk/mini.iso to assess the situation:

 - mini-20210731+deb11u8.iso from Bullseye 11.7
 - mini-20230427.iso from D-I Bookworm RC 2
 - mini-daily.isofrom D-I daily builds (downloaded today)

If people want to replicate those tests, they're available at:
  https://people.debian.org/~kibi/bug-drm-vs-uefi/

Or:

wget 
https://deb.debian.org/debian/dists/bullseye/main/installer-amd64/20210731+deb11u8/images/netboot/gtk/mini.iso
 -O mini-20210731+deb11u8.iso
wget 
https://deb.debian.org/debian/dists/bookworm/main/installer-amd64/20230427/images/netboot/gtk/mini.iso
 -O mini-20230427.iso
wget https://d-i.debian.org/daily-images/amd64/daily/netboot/gtk/mini.iso 
-O mini-daily.iso


Via QEMU, under BIOS and UEFI, results are:

  +-+-+-+-+
  |  Graphics   |  Bullseye 11.7  |  Bookworm RC 2  |  Daily builds   |
  +-+++++++
  | |  BIOS  |  UEFI  |  BIOS  |  UEFI  |  BIOS  |  UEFI  |
  +-+++++++
  | |   OK   |   OK   |   OK   |  KO-G  |   OK   |  KO-G  |
  | -vga std|   OK   |   OK   |   OK   |  KO-G  |   OK   |  KO-G  |
  | -vga cirrus |   OK   |   OK   |   OK   |  KO-S  |   OK   |  KO-S  |
  | -vga qxl|   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
  | -vga virtio |   OK   |   OK   |   OK   |   OK   |   OK   |   OK   |
  | -vga vmware |   OK   |   OK   |   OK   |   OK   |   OK