Bug#989124: grub-installer: occasional failure to install grub (when two DEs selected)

2021-05-26 Thread Cyril Brulebois
Philip Hands  (2021-05-26):
> Dear Maintainer,

Dear Bug Reporter,

(:D)

> While testing under openQA (so in qemu/kvm) if selecting more than one DE,
> somthing like one in ten installs will fail to install grub, resulting in an
> unbootable system.
> 
> Given that this is only happening in the unusual circumstance of selecting
> multiple desktops, and even then is only an intermitent bug, I've tagged it as
> minor.
> 
> An example of this can be found here:
> 
>   https://openqa.debian.net/tests/4457
> 
> which one can see hanging at the initial boot screen, rather than booting to 
> a login prompt.
> 
> One of the assets being collected it a dump of the start of the target block
> device, which in the failing case looks like this:
> 
>   https://openqa.debian.net/tests/4457/file/complete_install-dev_vda_dump.txt
> 
> whereas when things are working it looks like this:
> 
>   https://openqa.debian.net/tests/4439/file/complete_install-dev_vda_dump.txt
> 
> I have tried making it collect data earlier during the install
> but doing so resulted in bug going away.
> 
> [I had it flip to the console when mandb is being installed, as that sits on 
> the
> screen for quite a while so provides a good trigger for the action, and run a
> few commands to collect state, then flip back to the graphical screen.]
> 
> BTW The syslog from that failing run is here:
> 
>   https://openqa.debian.net/tests/4457/file/complete_install-syslog.txt
> 
> If there's more information that could usefully be collected, please mention
> what you think might help and I'll add it to the openqa scripts.

Comparing complete_install-syslog.txt for both runs, this feels icky (as
I think I already pointed out on IRC when you first asked):

ko.txt:

May 25 21:09:07 grub-installer: info: Installing grub on ''

ok.txt:

May 25 14:58:08 grub-installer: info: Installing grub on '/dev/vda'

I'm not sure I really trust the screenshots that show /dev/vda selected
in both cases. After all, looking one step before, the boolean regarding
installing GRUB wasn't captured at all in the failing case, compare the
screenshots starting here:

 - https://openqa.debian.net/tests/4457#step/grub/45 (ko)
 - https://openqa.debian.net/tests/4439#step/grub/45 (ok)

but maybe that's just a side effect of the console switching gymnastics
you mentioned? (Sending left Ctrl or the like every few minutes avoids
running into DPMS/blanking issues, I'm using that trick.)

Anyway, any chance you could add `DEBCONF_DEBUG=developer` on the kernel
command line, so that we have a chance of understanding what's happening
on the debconf level? Otherwise, we might try and hotpatch
grub-installer to add some more logging but if we could avoid that…


Cheers,
-- 
Cyril Brulebois (k...@debian.org)
D-I release manager -- Release team member -- Freelance Consultant


signature.asc
Description: PGP signature


Bug#989124: grub-installer: occasional failure to install grub (when two DEs selected)

2021-05-26 Thread Philip Hands
Package: grub-installer
Version: 1.178
Severity: minor

Dear Maintainer,

While testing under openQA (so in qemu/kvm) if selecting more than one DE,
somthing like one in ten installs will fail to install grub, resulting in an
unbootable system.

Given that this is only happening in the unusual circumstance of selecting
multiple desktops, and even then is only an intermitent bug, I've tagged it as
minor.

An example of this can be found here:

  https://openqa.debian.net/tests/4457

which one can see hanging at the initial boot screen, rather than booting to a 
login prompt.

One of the assets being collected it a dump of the start of the target block
device, which in the failing case looks like this:

  https://openqa.debian.net/tests/4457/file/complete_install-dev_vda_dump.txt

whereas when things are working it looks like this:

  https://openqa.debian.net/tests/4439/file/complete_install-dev_vda_dump.txt

I have tried making it collect data earlier during the install
but doing so resulted in bug going away.

[I had it flip to the console when mandb is being installed, as that sits on the
screen for quite a while so provides a good trigger for the action, and run a
few commands to collect state, then flip back to the graphical screen.]

BTW The syslog from that failing run is here:

  https://openqa.debian.net/tests/4457/file/complete_install-syslog.txt

If there's more information that could usefully be collected, please mention
what you think might help and I'll add it to the openqa scripts.

Cheers, Phil.