Re: F34 Cloud Amazon AMIs unbootable after updates
On Thu, Oct 7, 2021 at 5:48 PM Benjamin Herrenschmidt wrote: > > On Wed, 2021-10-06 at 15:41 -0500, Joe Doss wrote: > > > > > Does anybody know how to fix a currently broken instance and can > > > share > > > their solution? > > > > Is there anything on the console log when you reboot it after the > > updates? If you can share the log that would be helpful. I've seen two kinds of errors in the console log. The first is the kernel panic in the screenshot in my earlier email. Getting a larger console log dump in that case, when it works at all, just gets more of the same. I think I can avoid that error by avoiding certain instance types. m5 seems to work, but m5a and m6 do not. I think this might be related to the new UEFI boot type that EC2 now supports. On newer instance types, they default to UEFI mode unless legacy-bios is specified. Our AMIs do not specify a boot type, so they would use the instance type's default. m5 defaults to legacy-bios mode, and seems to work. I think m5a and m6 default to UEFI mode. We should start specifying our boot mode explicitly in our AMI composes (and eventually switch to UEFI mode, for a consistent experience in the cloud that matches modern physical hardware that uses UEFI). In any case, I could be incorrect, but since this is a boot problem, it seems possible the boot mode feature is the problem in those instance types. The second error is just an infinite scroll of a repeated error message that looked like: A start job is running for /dev/dis…f1a0b12a This is most likely https://bugzilla.redhat.com/show_bug.cgi?id=2010058 and manually applying the patch there seems to work (even though I wasn't sure if I did it correctly). > > There's the new "ISC" (interactive serial console) that can help if you > have grub timeout set to non-0... I couldn't get that to work over ssh, but it did work on the web UI. It wasn't very useful there, because the boot was stuck and wasn't accepting input. I couldn't catch it early enough for grub. I'm not quite certain of how to edit the grub2 config files to change the timeout. This would have helped me select the previous kernel, to avoid https://bugzilla.redhat.com/show_bug.cgi?id=2010058 , but it wouldn't have helped with the kernel panic on m5a/m6 instance types. > > Otherwise, you can detach the EBS volume, attach it to another > instance, mount & fixup, then back the other way around (the magic to > re-attach the root device is to call it "xvda" without number). That's what I ended up doing (but I always re-attach as /dev/sda1, how it was before). I've done this plenty of times before. The main problem this time was that there were no logs to investigate, and no clues what to change to fix the issue. I did attempt to chroot and do a DNF rollback. However, it seems DNF history command is buggy, and crashes with a message called "Reason Change". This appears to be a new thing in DNF transactions, and the DNF history command doesn't know how to handle it for rollbacks and undos. I used more primitive tools to change packages one at a time, eventually reinstalling the kernel after applying the fix in https://bugzilla.redhat.com/show_bug.cgi?id=2010058 > > Cheers, > Ben. Thanks all for the tips and suggestions. Persistence paid off... eventually. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: F34 Cloud Amazon AMIs unbootable after updates
On Thu, Oct 07, 2021 at 08:18:44AM +0100, Richard W.M. Jones wrote: > On Thu, Oct 07, 2021 at 12:46:11AM -0400, Christopher wrote: > > Running on EC2, it's kinda hard to get good information from a system > > that won't boot. The machine won't boot to the point of being able to > > capture the system log, and the screenshot of the instance doesn't > > appear to be super helpful: https://imgur.com/a/4PWcRSg > > Can you hit shift + PgUp and capture as much of the preceeding output > as possible? Nb. console scrollback was removed in kernel 5.9, so shift+pgup no longer works. -- Tomasz Torcz Morality must always be based on practicality. to...@pipebreaker.pl — Baron Vladimir Harkonnen ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: F34 Cloud Amazon AMIs unbootable after updates
On Wed, 2021-10-06 at 15:41 -0500, Joe Doss wrote: > > > Does anybody know how to fix a currently broken instance and can > > share > > their solution? > > Is there anything on the console log when you reboot it after the > updates? If you can share the log that would be helpful. There's the new "ISC" (interactive serial console) that can help if you have grub timeout set to non-0... Otherwise, you can detach the EBS volume, attach it to another instance, mount & fixup, then back the other way around (the magic to re-attach the root device is to call it "xvda" without number). Cheers, Ben. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: F34 Cloud Amazon AMIs unbootable after updates
On 10/7/21 3:52 AM, Richard Fearn wrote: > There is an issue with Xen instances (e.g. t2.small) - see > https://bugzilla.redhat.com/show_bug.cgi?id=2010058. > > What I saw was that it would hang for a couple of minutes waiting for > the disk to appear, then give up and go into emergency mode. > > The workaround is to edit the Dracut script that decides which modules > to include in the initramfs - to ensure that xen-blkfront is included. This also affects Qubes OS: https://github.com/QubesOS/qubes-issues/issues/6919. Sincerely, Demi Marie Obenour (she/her/hers) OpenPGP_0xB288B55FFF9C22C1.asc Description: OpenPGP public key OpenPGP_signature Description: OpenPGP digital signature ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: F34 Cloud Amazon AMIs unbootable after updates
There is an issue with Xen instances (e.g. t2.small) - see https://bugzilla.redhat.com/show_bug.cgi?id=2010058. What I saw was that it would hang for a couple of minutes waiting for the disk to appear, then give up and go into emergency mode. The workaround is to edit the Dracut script that decides which modules to include in the initramfs - to ensure that xen-blkfront is included. Your issue might be something else - I didn't see a panic, like in the screenshot. You could try and get more of the log as plain text by going to Actions → Monitor and troubleshoot → Get system log. Regards, Rich On Thu, 7 Oct 2021 at 08:19, Richard W.M. Jones wrote: > > On Thu, Oct 07, 2021 at 12:46:11AM -0400, Christopher wrote: > > Running on EC2, it's kinda hard to get good information from a system > > that won't boot. The machine won't boot to the point of being able to > > capture the system log, and the screenshot of the instance doesn't > > appear to be super helpful: https://imgur.com/a/4PWcRSg > > Can you hit shift + PgUp and capture as much of the preceeding output > as possible? > > Also it's apparently possible to connect a serial console: > https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-to-serial-console.html > which would be the ideal way to debug this. > > Rich. > > -- > Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones > Read my programming and virtualization blog: http://rwmj.wordpress.com > virt-builder quickly builds VMs from scratch > http://libguestfs.org/virt-builder.1.html > ___ > devel mailing list -- devel@lists.fedoraproject.org > To unsubscribe send an email to devel-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure -- Richard Fearn richardfe...@gmail.com ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: F34 Cloud Amazon AMIs unbootable after updates
On Thu, Oct 07, 2021 at 12:46:11AM -0400, Christopher wrote: > Running on EC2, it's kinda hard to get good information from a system > that won't boot. The machine won't boot to the point of being able to > capture the system log, and the screenshot of the instance doesn't > appear to be super helpful: https://imgur.com/a/4PWcRSg Can you hit shift + PgUp and capture as much of the preceeding output as possible? Also it's apparently possible to connect a serial console: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-to-serial-console.html which would be the ideal way to debug this. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-builder quickly builds VMs from scratch http://libguestfs.org/virt-builder.1.html ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: F34 Cloud Amazon AMIs unbootable after updates
What appears to be our nightly AMI build for F34 with updates (125523088429/Fedora-Cloud-Base-34-20211006.0.x86_64-hvm-us-east-1-gp2-0) won't even start once (no updating required; it's immediately broken). The attachment won't go through on this list, but I captured the last lines from the system log on first boot. It looks like more lines like what was in the screenshot I previously shared. On Thu, Oct 7, 2021 at 12:46 AM Christopher wrote: > > Running on EC2, it's kinda hard to get good information from a system > that won't boot. The machine won't boot to the point of being able to > capture the system log, and the screenshot of the instance doesn't > appear to be super helpful: https://imgur.com/a/4PWcRSg > > On Wed, Oct 6, 2021 at 4:42 PM Joe Doss wrote: > > > > On 10/6/21 3:18 PM, Christopher wrote: > > > Hi, > > > > > > Has anybody else noticed that the Amazon Public Cloud images for F34 > > > (https://alt.fedoraproject.org/cloud/) no longer boot after the latest > > > updates? > > > > > > I had an instance that I've been keeping up-to-date with dnf system > > > upgrades and is now on F34, which is now unbootable after recent > > > updates within the last week. So I tried to create a new instance > > > using a newer base image at https://alt.fedoraproject.org/cloud/, and > > > that one is also now unbootable after doing a routine dnf update. > > > > > > Has anybody else seen this? > > > > > > Does anybody know which package update caused it? (I saw some > > > grub-related updates, but not sure if they are to blame) > > > > > > Does anybody know how to fix a currently broken instance and can share > > > their solution? > > > > Is there anything on the console log when you reboot it after the > > updates? If you can share the log that would be helpful. > > > > Joe > > > > > > > > > > -- > > Joe Doss > > j...@solidadmin.com > > ___ > > devel mailing list -- devel@lists.fedoraproject.org > > To unsubscribe send an email to devel-le...@lists.fedoraproject.org > > Fedora Code of Conduct: > > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > > List Archives: > > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org > > Do not reply to spam on the list, report it: > > https://pagure.io/fedora-infrastructure ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: F34 Cloud Amazon AMIs unbootable after updates
Running on EC2, it's kinda hard to get good information from a system that won't boot. The machine won't boot to the point of being able to capture the system log, and the screenshot of the instance doesn't appear to be super helpful: https://imgur.com/a/4PWcRSg On Wed, Oct 6, 2021 at 4:42 PM Joe Doss wrote: > > On 10/6/21 3:18 PM, Christopher wrote: > > Hi, > > > > Has anybody else noticed that the Amazon Public Cloud images for F34 > > (https://alt.fedoraproject.org/cloud/) no longer boot after the latest > > updates? > > > > I had an instance that I've been keeping up-to-date with dnf system > > upgrades and is now on F34, which is now unbootable after recent > > updates within the last week. So I tried to create a new instance > > using a newer base image at https://alt.fedoraproject.org/cloud/, and > > that one is also now unbootable after doing a routine dnf update. > > > > Has anybody else seen this? > > > > Does anybody know which package update caused it? (I saw some > > grub-related updates, but not sure if they are to blame) > > > > Does anybody know how to fix a currently broken instance and can share > > their solution? > > Is there anything on the console log when you reboot it after the > updates? If you can share the log that would be helpful. > > Joe > > > > > -- > Joe Doss > j...@solidadmin.com > ___ > devel mailing list -- devel@lists.fedoraproject.org > To unsubscribe send an email to devel-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org > Do not reply to spam on the list, report it: > https://pagure.io/fedora-infrastructure ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: F34 Cloud Amazon AMIs unbootable after updates
On 10/6/21 3:18 PM, Christopher wrote: Hi, Has anybody else noticed that the Amazon Public Cloud images for F34 (https://alt.fedoraproject.org/cloud/) no longer boot after the latest updates? I had an instance that I've been keeping up-to-date with dnf system upgrades and is now on F34, which is now unbootable after recent updates within the last week. So I tried to create a new instance using a newer base image at https://alt.fedoraproject.org/cloud/, and that one is also now unbootable after doing a routine dnf update. Has anybody else seen this? Does anybody know which package update caused it? (I saw some grub-related updates, but not sure if they are to blame) Does anybody know how to fix a currently broken instance and can share their solution? Is there anything on the console log when you reboot it after the updates? If you can share the log that would be helpful. Joe -- Joe Doss j...@solidadmin.com ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
F34 Cloud Amazon AMIs unbootable after updates
Hi, Has anybody else noticed that the Amazon Public Cloud images for F34 (https://alt.fedoraproject.org/cloud/) no longer boot after the latest updates? I had an instance that I've been keeping up-to-date with dnf system upgrades and is now on F34, which is now unbootable after recent updates within the last week. So I tried to create a new instance using a newer base image at https://alt.fedoraproject.org/cloud/, and that one is also now unbootable after doing a routine dnf update. Has anybody else seen this? Does anybody know which package update caused it? (I saw some grub-related updates, but not sure if they are to blame) Does anybody know how to fix a currently broken instance and can share their solution? Thanks, Christopher ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure