Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-08 Thread Christopher
On Thu, Oct 7, 2021 at 5:48 PM Benjamin Herrenschmidt
 wrote:
>
> On Wed, 2021-10-06 at 15:41 -0500, Joe Doss wrote:
> >
> > > Does anybody know how to fix a currently broken instance and can
> > > share
> > > their solution?
> >
> > Is there anything on the console log when you reboot it after the
> > updates? If you can share the log that would be helpful.

I've seen two kinds of errors in the console log.

The first is the kernel panic in the screenshot in my earlier email.
Getting a larger console log dump in that case, when it works at all,
just gets more of the same. I think I can avoid that error by avoiding
certain instance types. m5 seems to work, but m5a and m6 do not. I
think this might be related to the new UEFI boot type that EC2 now
supports. On newer instance types, they default to UEFI mode unless
legacy-bios is specified. Our AMIs do not specify a boot type, so they
would use the instance type's default. m5 defaults to legacy-bios
mode, and seems to work. I think m5a and m6 default to UEFI mode. We
should start specifying our boot mode explicitly in our AMI composes
(and eventually switch to UEFI mode, for a consistent experience in
the cloud that matches modern physical hardware that uses UEFI). In
any case, I could be incorrect, but since this is a boot problem, it
seems possible the boot mode feature is the problem in those instance
types.

The second error is just an infinite scroll of a repeated error
message that looked like: A start job is running for
/dev/dis…f1a0b12a  This is most likely
https://bugzilla.redhat.com/show_bug.cgi?id=2010058 and manually
applying the patch there seems to work (even though I wasn't sure if I
did it correctly).

>
> There's the new "ISC" (interactive serial console) that can help if you
> have grub timeout set to non-0...

I couldn't get that to work over ssh, but it did work on the web UI.
It wasn't very useful there, because the boot was stuck and wasn't
accepting input. I couldn't catch it early enough for grub. I'm not
quite certain of how to edit the grub2 config files to change the
timeout. This would have helped me select the previous kernel, to
avoid https://bugzilla.redhat.com/show_bug.cgi?id=2010058 , but it
wouldn't have helped with the kernel panic on m5a/m6 instance types.

>
> Otherwise, you can detach the EBS volume, attach it to another
> instance, mount & fixup, then back the other way around (the magic to
> re-attach the root device is to call it "xvda" without number).

That's what I ended up doing (but I always re-attach as /dev/sda1, how
it was before). I've done this plenty of times before. The main
problem this time was that there were no logs to investigate, and no
clues what to change to fix the issue.

I did attempt to chroot and do a DNF rollback. However, it seems DNF
history command is buggy, and crashes with a message called "Reason
Change". This appears to be a new thing in DNF transactions, and the
DNF history command doesn't know how to handle it for rollbacks and
undos. I used more primitive tools to change packages one at a time,
eventually reinstalling the kernel after applying the fix in
https://bugzilla.redhat.com/show_bug.cgi?id=2010058


>
> Cheers,
> Ben.

Thanks all for the tips and suggestions. Persistence paid off... eventually.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-07 Thread Tomasz Torcz
On Thu, Oct 07, 2021 at 08:18:44AM +0100, Richard W.M. Jones wrote:
> On Thu, Oct 07, 2021 at 12:46:11AM -0400, Christopher wrote:
> > Running on EC2, it's kinda hard to get good information from a system
> > that won't boot. The machine won't boot to the point of being able to
> > capture the system log, and the screenshot of the instance doesn't
> > appear to be super helpful: https://imgur.com/a/4PWcRSg
> 
> Can you hit shift + PgUp and capture as much of the preceeding output
> as possible?

 Nb. console scrollback was removed in kernel 5.9, so shift+pgup no
longer works.


-- 
Tomasz Torcz Morality must always be based on practicality.
to...@pipebreaker.pl — Baron Vladimir Harkonnen
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-07 Thread Benjamin Herrenschmidt
On Wed, 2021-10-06 at 15:41 -0500, Joe Doss wrote:
> 
> > Does anybody know how to fix a currently broken instance and can
> > share
> > their solution?
> 
> Is there anything on the console log when you reboot it after the 
> updates? If you can share the log that would be helpful.

There's the new "ISC" (interactive serial console) that can help if you
have grub timeout set to non-0...

Otherwise, you can detach the EBS volume, attach it to another
instance, mount & fixup, then back the other way around (the magic to
re-attach the root device is to call it "xvda" without number).

Cheers,
Ben.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-07 Thread Demi Marie Obenour
On 10/7/21 3:52 AM, Richard Fearn wrote:
> There is an issue with Xen instances (e.g. t2.small) - see
> https://bugzilla.redhat.com/show_bug.cgi?id=2010058.
> 
> What I saw was that it would hang for a couple of minutes waiting for
> the disk to appear, then give up and go into emergency mode.
> 
> The workaround is to edit the Dracut script that decides which modules
> to include in the initramfs - to ensure that xen-blkfront is included.

This also affects Qubes OS: https://github.com/QubesOS/qubes-issues/issues/6919.

Sincerely,

Demi Marie Obenour (she/her/hers)


OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-07 Thread Richard Fearn
There is an issue with Xen instances (e.g. t2.small) - see
https://bugzilla.redhat.com/show_bug.cgi?id=2010058.

What I saw was that it would hang for a couple of minutes waiting for
the disk to appear, then give up and go into emergency mode.

The workaround is to edit the Dracut script that decides which modules
to include in the initramfs - to ensure that xen-blkfront is included.

Your issue might be something else - I didn't see a panic, like in the
screenshot. You could try and get more of the log as plain text by
going to Actions → Monitor and troubleshoot → Get system log.

Regards,

Rich

On Thu, 7 Oct 2021 at 08:19, Richard W.M. Jones  wrote:
>
> On Thu, Oct 07, 2021 at 12:46:11AM -0400, Christopher wrote:
> > Running on EC2, it's kinda hard to get good information from a system
> > that won't boot. The machine won't boot to the point of being able to
> > capture the system log, and the screenshot of the instance doesn't
> > appear to be super helpful: https://imgur.com/a/4PWcRSg
>
> Can you hit shift + PgUp and capture as much of the preceeding output
> as possible?
>
> Also it's apparently possible to connect a serial console:
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-to-serial-console.html
> which would be the ideal way to debug this.
>
> Rich.
>
> --
> Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
> Read my programming and virtualization blog: http://rwmj.wordpress.com
> virt-builder quickly builds VMs from scratch
> http://libguestfs.org/virt-builder.1.html
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam on the list, report it: 
> https://pagure.io/fedora-infrastructure



-- 
Richard Fearn
richardfe...@gmail.com
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-07 Thread Richard W.M. Jones
On Thu, Oct 07, 2021 at 12:46:11AM -0400, Christopher wrote:
> Running on EC2, it's kinda hard to get good information from a system
> that won't boot. The machine won't boot to the point of being able to
> capture the system log, and the screenshot of the instance doesn't
> appear to be super helpful: https://imgur.com/a/4PWcRSg

Can you hit shift + PgUp and capture as much of the preceeding output
as possible?

Also it's apparently possible to connect a serial console:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/connect-to-serial-console.html
which would be the ideal way to debug this.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-07 Thread Christopher
What appears to be our nightly AMI build for F34 with updates
(125523088429/Fedora-Cloud-Base-34-20211006.0.x86_64-hvm-us-east-1-gp2-0)
won't even start once (no updating required; it's immediately broken).
The attachment won't go through on this list, but I
captured the last lines from the system log on first boot. It looks
like more lines like what was in the screenshot I previously shared.

On Thu, Oct 7, 2021 at 12:46 AM Christopher  wrote:
>
> Running on EC2, it's kinda hard to get good information from a system
> that won't boot. The machine won't boot to the point of being able to
> capture the system log, and the screenshot of the instance doesn't
> appear to be super helpful: https://imgur.com/a/4PWcRSg
>
> On Wed, Oct 6, 2021 at 4:42 PM Joe Doss  wrote:
> >
> > On 10/6/21 3:18 PM, Christopher wrote:
> > > Hi,
> > >
> > > Has anybody else noticed that the Amazon Public Cloud images for F34
> > > (https://alt.fedoraproject.org/cloud/) no longer boot after the latest
> > > updates?
> > >
> > > I had an instance that I've been keeping up-to-date with dnf system
> > > upgrades and is now on F34, which is now unbootable after recent
> > > updates within the last week. So I tried to create a new instance
> > > using a newer base image at https://alt.fedoraproject.org/cloud/, and
> > > that one is also now unbootable after doing a routine dnf update.
> > >
> > > Has anybody else seen this?
> > >
> > > Does anybody know which package update caused it? (I saw some
> > > grub-related updates, but not sure if they are to blame)
> > >
> > > Does anybody know how to fix a currently broken instance and can share
> > > their solution?
> >
> > Is there anything on the console log when you reboot it after the
> > updates? If you can share the log that would be helpful.
> >
> > Joe
> >
> >
> >
> >
> > --
> > Joe Doss
> > j...@solidadmin.com
> > ___
> > devel mailing list -- devel@lists.fedoraproject.org
> > To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> > Fedora Code of Conduct: 
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives: 
> > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> > Do not reply to spam on the list, report it: 
> > https://pagure.io/fedora-infrastructure
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-06 Thread Christopher
Running on EC2, it's kinda hard to get good information from a system
that won't boot. The machine won't boot to the point of being able to
capture the system log, and the screenshot of the instance doesn't
appear to be super helpful: https://imgur.com/a/4PWcRSg

On Wed, Oct 6, 2021 at 4:42 PM Joe Doss  wrote:
>
> On 10/6/21 3:18 PM, Christopher wrote:
> > Hi,
> >
> > Has anybody else noticed that the Amazon Public Cloud images for F34
> > (https://alt.fedoraproject.org/cloud/) no longer boot after the latest
> > updates?
> >
> > I had an instance that I've been keeping up-to-date with dnf system
> > upgrades and is now on F34, which is now unbootable after recent
> > updates within the last week. So I tried to create a new instance
> > using a newer base image at https://alt.fedoraproject.org/cloud/, and
> > that one is also now unbootable after doing a routine dnf update.
> >
> > Has anybody else seen this?
> >
> > Does anybody know which package update caused it? (I saw some
> > grub-related updates, but not sure if they are to blame)
> >
> > Does anybody know how to fix a currently broken instance and can share
> > their solution?
>
> Is there anything on the console log when you reboot it after the
> updates? If you can share the log that would be helpful.
>
> Joe
>
>
>
>
> --
> Joe Doss
> j...@solidadmin.com
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> Do not reply to spam on the list, report it: 
> https://pagure.io/fedora-infrastructure
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


Re: F34 Cloud Amazon AMIs unbootable after updates

2021-10-06 Thread Joe Doss

On 10/6/21 3:18 PM, Christopher wrote:

Hi,

Has anybody else noticed that the Amazon Public Cloud images for F34
(https://alt.fedoraproject.org/cloud/) no longer boot after the latest
updates?

I had an instance that I've been keeping up-to-date with dnf system
upgrades and is now on F34, which is now unbootable after recent
updates within the last week. So I tried to create a new instance
using a newer base image at https://alt.fedoraproject.org/cloud/, and
that one is also now unbootable after doing a routine dnf update.

Has anybody else seen this?

Does anybody know which package update caused it? (I saw some
grub-related updates, but not sure if they are to blame)

Does anybody know how to fix a currently broken instance and can share
their solution?


Is there anything on the console log when you reboot it after the 
updates? If you can share the log that would be helpful.


Joe




--
Joe Doss
j...@solidadmin.com
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure


F34 Cloud Amazon AMIs unbootable after updates

2021-10-06 Thread Christopher
Hi,

Has anybody else noticed that the Amazon Public Cloud images for F34
(https://alt.fedoraproject.org/cloud/) no longer boot after the latest
updates?

I had an instance that I've been keeping up-to-date with dnf system
upgrades and is now on F34, which is now unbootable after recent
updates within the last week. So I tried to create a new instance
using a newer base image at https://alt.fedoraproject.org/cloud/, and
that one is also now unbootable after doing a routine dnf update.

Has anybody else seen this?

Does anybody know which package update caused it? (I saw some
grub-related updates, but not sure if they are to blame)

Does anybody know how to fix a currently broken instance and can share
their solution?


Thanks,
Christopher
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam on the list, report it: 
https://pagure.io/fedora-infrastructure