Re: Working recovery with locked root user (rescue.service)

2020-12-11 Thread Colin Walters


On Thu, Dec 10, 2020, at 5:56 PM, Chris Murphy wrote:

> I personally am gravitating toward the idea of not updating the
> currently running OS (sometimes called transactional system updates)
> where if we had a way to test the out-of-band updated OS, like in a
> container or VM,

We've been doing that for over 3 years now in rpm-ostree:
https://github.com/coreos/rpm-ostree/pull/892

Yeah there's obviously *more* we could do than run just /bin/true, including 
running systemd-in-container but that escalates quickly in scope.

(I was going to write more here about how the real problem composes should be 
tested/promoted but we're already doing that in FCOS by entangling our build 
and test system, and we can take up the discussion of the relationship between 
that and traditional Fedora in the edition discussions)




___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Working recovery with locked root user (rescue.service)

2020-12-10 Thread Marius Schwarz

Am 10.12.20 um 23:56 schrieb Chris Murphy:

There is also the sysroot fails to mount problem. That leaves us in
the initramfs which is an even more limited environment. For sure
falling over at boot or during startup is rare, but no matter why it
often induces panic in even experienced users, in part because it's
rare.

50 cents:

As you need physical access to run a debug shell, you can insert a 
livedisk stick which gives way more help as initramfs tools.
If you try to access a virtual maschine, you can mostly insert another 
bootimage and even with ipmi modules, it's possible to mount a virtual 
disk with repair tools.


Preperation is everything, so don't invest too much amount in lost causes.

best regards,
Marius Schwarz
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Working recovery with locked root user (rescue.service)

2020-12-10 Thread Chris Murphy
On Thu, Dec 10, 2020 at 1:07 PM Benjamin Berg  wrote:
>
> Hi,
>
> On Thu, 2020-12-10 at 12:20 -0700, Chris Murphy wrote:
> > On Thu, Dec 10, 2020 at 5:40 AM Benjamin Berg 
> > wrote:
> > > Hi,
> > >
> > > so, the other day we had a major regression in the PAM stack[1]
> > > that,
> > > unfortunately, ended up hitting rawhide and the Fedora 33 testing
> > > (not
> > > stable) repository before being unpushed.
> > >
> > > In this case it was easy to work around as SSH was still working
> > > fine.
> > > But, it seems that rescue mode requires having a root password set,
> > > which we do not always do during the Fedora install.
> > >
> > >
> > > So, I think we should have an obvious way for users to enter
> > > recovery
> > > mode even with a locked root account.
> > >
> > > Currently rescue.service is executing "systemd-sulogin-shell" which
> > > in
> > > turn runs "sulogin" (part of util-linux). A workaround is to
> > > set SYSTEMD_SULOGIN_FORCE=1 in rescue.service, but that just
> > > disables
> > > authentication entirely.
> > >
> > > I suppose to improve this, we would need a kind of "sudologin" that
> > > accepts any user in the "wheel" group. Or maybe some other more
> > > rigid
> > > requirement like configuring the first admin user that was created.
> > >
> > > Anyone has a good idea on how to solve this?
> >
> > I solve it with early debug shell using boot param
> > systemd.debug-shell=1 but that presents a root login on tty9 without
> > needing a password.
>
> Yeah, if you are able to modify the command line and have the
> background, then it is really simple to bypass the authentication.
>
> > I'm under the impression authentication services aren't even available
> > for emergency or rescue targets (?). I also wonder what happens if we
> > move to systemd-homed and whether that can start sooner and provide
> > the ability to use rescue target? Or if it starts late enough that it
> > can't be used for rescue and then also what that means for non-root
> > use of rescue because with systemd-home, there are no (human) users in
> > /etc at all.
>
> True, systemd-homed could be a problem.
>
> Maybe at the end of the day this is a lost cause?
>
> I mean, if you need to drop into rescue mode, you already need to have
> quite in-depth knowledge. So it could be better to focus on having more
> versatile solutions. Like being able to revert back to a known good
> state of the OS instead of providing a rescue shell.

There is also the sysroot fails to mount problem. That leaves us in
the initramfs which is an even more limited environment. For sure
falling over at boot or during startup is rare, but no matter why it
often induces panic in even experienced users, in part because it's
rare.

rpm-ostree has a way to mostly solve the problem if the startup
failure is isolated to a particular deployment. But it could still
have the rare case where it falls over in the initramfs. So that's a
hole that would be nice to fix because it's something all Fedora
editions and spins could fall into.

There's a wish list item / idea for a recovery partition from which a
system could be booted. Maybe it's a limited "netintsall" kind of
environment, to keep it space efficient. (While it's in the Fedora
Btrfs tracker, it doesn't mean system root must be Btrfs.)
https://pagure.io/fedora-btrfs/project/issue/23

And also a couple of Btrfs specific snapshot-rollback ideas
https://pagure.io/fedora-btrfs/project/issue/18
https://pagure.io/fedora-btrfs/project/issue/31

A bit more tangentially related is can we make it easy and cheap for
folks to backup consistently so that a reset is less painful? This is
neat but probably a hard sell to actually depend on most users opting
into, however good of an idea it is to back up regularly.
https://pagure.io/fedora-btrfs/project/issue/12

There are other ways boot+startup can fail other than a regression in
a package, we kinda need to look at all of them and see if it's
possible to take a holistic approach that solves a large chunk of them
at once. It's one reason why I'm not pushing hard for /boot on Btrfs,
because we don't need another option just to have another option.
There are actually good reasons to put /boot on Btrfs no matter what
the sysroot file system is, so if there's a way to "standardize"
regardless of what that is, the better off we are. But if not /boot on
Btrfs we need some other way to deal with the disconnect on rollback
between the kernels on /boot and the possibly older modules on an
older sysroot snapshot.

I personally am gravitating toward the idea of not updating the
currently running OS (sometimes called transactional system updates)
where if we had a way to test the out-of-band updated OS, like in a
container or VM, and only if it passes do we make it the next active
system at reboot time. There's some complexities there but also
rpm-ostree has learned a lot of those lessons that maybe we wouldn't
have to relearn. This might make it possible to avoid the need for 

Re: Working recovery with locked root user (rescue.service)

2020-12-10 Thread Benjamin Berg
Hi,

On Thu, 2020-12-10 at 12:20 -0700, Chris Murphy wrote:
> On Thu, Dec 10, 2020 at 5:40 AM Benjamin Berg 
> wrote:
> > Hi,
> > 
> > so, the other day we had a major regression in the PAM stack[1]
> > that,
> > unfortunately, ended up hitting rawhide and the Fedora 33 testing
> > (not
> > stable) repository before being unpushed.
> > 
> > In this case it was easy to work around as SSH was still working
> > fine.
> > But, it seems that rescue mode requires having a root password set,
> > which we do not always do during the Fedora install.
> > 
> > 
> > So, I think we should have an obvious way for users to enter
> > recovery
> > mode even with a locked root account.
> > 
> > Currently rescue.service is executing "systemd-sulogin-shell" which
> > in
> > turn runs "sulogin" (part of util-linux). A workaround is to
> > set SYSTEMD_SULOGIN_FORCE=1 in rescue.service, but that just
> > disables
> > authentication entirely.
> > 
> > I suppose to improve this, we would need a kind of "sudologin" that
> > accepts any user in the "wheel" group. Or maybe some other more
> > rigid
> > requirement like configuring the first admin user that was created.
> > 
> > Anyone has a good idea on how to solve this?
> 
> I solve it with early debug shell using boot param
> systemd.debug-shell=1 but that presents a root login on tty9 without
> needing a password.

Yeah, if you are able to modify the command line and have the
background, then it is really simple to bypass the authentication.

> I'm under the impression authentication services aren't even available
> for emergency or rescue targets (?). I also wonder what happens if we
> move to systemd-homed and whether that can start sooner and provide
> the ability to use rescue target? Or if it starts late enough that it
> can't be used for rescue and then also what that means for non-root
> use of rescue because with systemd-home, there are no (human) users in
> /etc at all.

True, systemd-homed could be a problem.

Maybe at the end of the day this is a lost cause?

I mean, if you need to drop into rescue mode, you already need to have
quite in-depth knowledge. So it could be better to focus on having more
versatile solutions. Like being able to revert back to a known good
state of the OS instead of providing a rescue shell.

Benjamin


signature.asc
Description: This is a digitally signed message part
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Working recovery with locked root user (rescue.service)

2020-12-10 Thread Chris Murphy
On Thu, Dec 10, 2020 at 5:40 AM Benjamin Berg  wrote:
>
> Hi,
>
> so, the other day we had a major regression in the PAM stack[1] that,
> unfortunately, ended up hitting rawhide and the Fedora 33 testing (not
> stable) repository before being unpushed.
>
> In this case it was easy to work around as SSH was still working fine.
> But, it seems that rescue mode requires having a root password set,
> which we do not always do during the Fedora install.
>
>
> So, I think we should have an obvious way for users to enter recovery
> mode even with a locked root account.
>
> Currently rescue.service is executing "systemd-sulogin-shell" which in
> turn runs "sulogin" (part of util-linux). A workaround is to
> set SYSTEMD_SULOGIN_FORCE=1 in rescue.service, but that just disables
> authentication entirely.
>
> I suppose to improve this, we would need a kind of "sudologin" that
> accepts any user in the "wheel" group. Or maybe some other more rigid
> requirement like configuring the first admin user that was created.
>
> Anyone has a good idea on how to solve this?

I solve it with early debug shell using boot param
systemd.debug-shell=1 but that presents a root login on tty9 without
needing a password.

I'm under the impression authentication services aren't even available
for emergency or rescue targets (?). I also wonder what happens if we
move to systemd-homed and whether that can start sooner and provide
the ability to use rescue target? Or if it starts late enough that it
can't be used for rescue and then also what that means for non-root
use of rescue because with systemd-home, there are no (human) users in
/etc at all.



-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Working recovery with locked root user (rescue.service)

2020-12-10 Thread Michael Catanzaro

On Thu, Dec 10, 2020 at 1:39 pm, Benjamin Berg  wrote:

I suppose to improve this, we would need a kind of "sudologin" that
accepts any user in the "wheel" group. Or maybe some other more rigid
requirement like configuring the first admin user that was created.


I'd say ideally any user in wheel would be able to recover. You would 
need to be able to enter a username at the recovery prompt for this to 
work, of course.


P.S. The recovery prompt is always English (US) only, which is a shame.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org