Re: Working recovery with locked root user (rescue.service)
On Thu, Dec 10, 2020, at 5:56 PM, Chris Murphy wrote: > I personally am gravitating toward the idea of not updating the > currently running OS (sometimes called transactional system updates) > where if we had a way to test the out-of-band updated OS, like in a > container or VM, We've been doing that for over 3 years now in rpm-ostree: https://github.com/coreos/rpm-ostree/pull/892 Yeah there's obviously *more* we could do than run just /bin/true, including running systemd-in-container but that escalates quickly in scope. (I was going to write more here about how the real problem composes should be tested/promoted but we're already doing that in FCOS by entangling our build and test system, and we can take up the discussion of the relationship between that and traditional Fedora in the edition discussions) ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Working recovery with locked root user (rescue.service)
Am 10.12.20 um 23:56 schrieb Chris Murphy: There is also the sysroot fails to mount problem. That leaves us in the initramfs which is an even more limited environment. For sure falling over at boot or during startup is rare, but no matter why it often induces panic in even experienced users, in part because it's rare. 50 cents: As you need physical access to run a debug shell, you can insert a livedisk stick which gives way more help as initramfs tools. If you try to access a virtual maschine, you can mostly insert another bootimage and even with ipmi modules, it's possible to mount a virtual disk with repair tools. Preperation is everything, so don't invest too much amount in lost causes. best regards, Marius Schwarz ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Working recovery with locked root user (rescue.service)
On Thu, Dec 10, 2020 at 1:07 PM Benjamin Berg wrote: > > Hi, > > On Thu, 2020-12-10 at 12:20 -0700, Chris Murphy wrote: > > On Thu, Dec 10, 2020 at 5:40 AM Benjamin Berg > > wrote: > > > Hi, > > > > > > so, the other day we had a major regression in the PAM stack[1] > > > that, > > > unfortunately, ended up hitting rawhide and the Fedora 33 testing > > > (not > > > stable) repository before being unpushed. > > > > > > In this case it was easy to work around as SSH was still working > > > fine. > > > But, it seems that rescue mode requires having a root password set, > > > which we do not always do during the Fedora install. > > > > > > > > > So, I think we should have an obvious way for users to enter > > > recovery > > > mode even with a locked root account. > > > > > > Currently rescue.service is executing "systemd-sulogin-shell" which > > > in > > > turn runs "sulogin" (part of util-linux). A workaround is to > > > set SYSTEMD_SULOGIN_FORCE=1 in rescue.service, but that just > > > disables > > > authentication entirely. > > > > > > I suppose to improve this, we would need a kind of "sudologin" that > > > accepts any user in the "wheel" group. Or maybe some other more > > > rigid > > > requirement like configuring the first admin user that was created. > > > > > > Anyone has a good idea on how to solve this? > > > > I solve it with early debug shell using boot param > > systemd.debug-shell=1 but that presents a root login on tty9 without > > needing a password. > > Yeah, if you are able to modify the command line and have the > background, then it is really simple to bypass the authentication. > > > I'm under the impression authentication services aren't even available > > for emergency or rescue targets (?). I also wonder what happens if we > > move to systemd-homed and whether that can start sooner and provide > > the ability to use rescue target? Or if it starts late enough that it > > can't be used for rescue and then also what that means for non-root > > use of rescue because with systemd-home, there are no (human) users in > > /etc at all. > > True, systemd-homed could be a problem. > > Maybe at the end of the day this is a lost cause? > > I mean, if you need to drop into rescue mode, you already need to have > quite in-depth knowledge. So it could be better to focus on having more > versatile solutions. Like being able to revert back to a known good > state of the OS instead of providing a rescue shell. There is also the sysroot fails to mount problem. That leaves us in the initramfs which is an even more limited environment. For sure falling over at boot or during startup is rare, but no matter why it often induces panic in even experienced users, in part because it's rare. rpm-ostree has a way to mostly solve the problem if the startup failure is isolated to a particular deployment. But it could still have the rare case where it falls over in the initramfs. So that's a hole that would be nice to fix because it's something all Fedora editions and spins could fall into. There's a wish list item / idea for a recovery partition from which a system could be booted. Maybe it's a limited "netintsall" kind of environment, to keep it space efficient. (While it's in the Fedora Btrfs tracker, it doesn't mean system root must be Btrfs.) https://pagure.io/fedora-btrfs/project/issue/23 And also a couple of Btrfs specific snapshot-rollback ideas https://pagure.io/fedora-btrfs/project/issue/18 https://pagure.io/fedora-btrfs/project/issue/31 A bit more tangentially related is can we make it easy and cheap for folks to backup consistently so that a reset is less painful? This is neat but probably a hard sell to actually depend on most users opting into, however good of an idea it is to back up regularly. https://pagure.io/fedora-btrfs/project/issue/12 There are other ways boot+startup can fail other than a regression in a package, we kinda need to look at all of them and see if it's possible to take a holistic approach that solves a large chunk of them at once. It's one reason why I'm not pushing hard for /boot on Btrfs, because we don't need another option just to have another option. There are actually good reasons to put /boot on Btrfs no matter what the sysroot file system is, so if there's a way to "standardize" regardless of what that is, the better off we are. But if not /boot on Btrfs we need some other way to deal with the disconnect on rollback between the kernels on /boot and the possibly older modules on an older sysroot snapshot. I personally am gravitating toward the idea of not updating the currently running OS (sometimes called transactional system updates) where if we had a way to test the out-of-band updated OS, like in a container or VM, and only if it passes do we make it the next active system at reboot time. There's some complexities there but also rpm-ostree has learned a lot of those lessons that maybe we wouldn't have to relearn. This might make it possible to avoid the need for
Re: Working recovery with locked root user (rescue.service)
Hi, On Thu, 2020-12-10 at 12:20 -0700, Chris Murphy wrote: > On Thu, Dec 10, 2020 at 5:40 AM Benjamin Berg > wrote: > > Hi, > > > > so, the other day we had a major regression in the PAM stack[1] > > that, > > unfortunately, ended up hitting rawhide and the Fedora 33 testing > > (not > > stable) repository before being unpushed. > > > > In this case it was easy to work around as SSH was still working > > fine. > > But, it seems that rescue mode requires having a root password set, > > which we do not always do during the Fedora install. > > > > > > So, I think we should have an obvious way for users to enter > > recovery > > mode even with a locked root account. > > > > Currently rescue.service is executing "systemd-sulogin-shell" which > > in > > turn runs "sulogin" (part of util-linux). A workaround is to > > set SYSTEMD_SULOGIN_FORCE=1 in rescue.service, but that just > > disables > > authentication entirely. > > > > I suppose to improve this, we would need a kind of "sudologin" that > > accepts any user in the "wheel" group. Or maybe some other more > > rigid > > requirement like configuring the first admin user that was created. > > > > Anyone has a good idea on how to solve this? > > I solve it with early debug shell using boot param > systemd.debug-shell=1 but that presents a root login on tty9 without > needing a password. Yeah, if you are able to modify the command line and have the background, then it is really simple to bypass the authentication. > I'm under the impression authentication services aren't even available > for emergency or rescue targets (?). I also wonder what happens if we > move to systemd-homed and whether that can start sooner and provide > the ability to use rescue target? Or if it starts late enough that it > can't be used for rescue and then also what that means for non-root > use of rescue because with systemd-home, there are no (human) users in > /etc at all. True, systemd-homed could be a problem. Maybe at the end of the day this is a lost cause? I mean, if you need to drop into rescue mode, you already need to have quite in-depth knowledge. So it could be better to focus on having more versatile solutions. Like being able to revert back to a known good state of the OS instead of providing a rescue shell. Benjamin signature.asc Description: This is a digitally signed message part ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Working recovery with locked root user (rescue.service)
On Thu, Dec 10, 2020 at 5:40 AM Benjamin Berg wrote: > > Hi, > > so, the other day we had a major regression in the PAM stack[1] that, > unfortunately, ended up hitting rawhide and the Fedora 33 testing (not > stable) repository before being unpushed. > > In this case it was easy to work around as SSH was still working fine. > But, it seems that rescue mode requires having a root password set, > which we do not always do during the Fedora install. > > > So, I think we should have an obvious way for users to enter recovery > mode even with a locked root account. > > Currently rescue.service is executing "systemd-sulogin-shell" which in > turn runs "sulogin" (part of util-linux). A workaround is to > set SYSTEMD_SULOGIN_FORCE=1 in rescue.service, but that just disables > authentication entirely. > > I suppose to improve this, we would need a kind of "sudologin" that > accepts any user in the "wheel" group. Or maybe some other more rigid > requirement like configuring the first admin user that was created. > > Anyone has a good idea on how to solve this? I solve it with early debug shell using boot param systemd.debug-shell=1 but that presents a root login on tty9 without needing a password. I'm under the impression authentication services aren't even available for emergency or rescue targets (?). I also wonder what happens if we move to systemd-homed and whether that can start sooner and provide the ability to use rescue target? Or if it starts late enough that it can't be used for rescue and then also what that means for non-root use of rescue because with systemd-home, there are no (human) users in /etc at all. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Working recovery with locked root user (rescue.service)
On Thu, Dec 10, 2020 at 1:39 pm, Benjamin Berg wrote: I suppose to improve this, we would need a kind of "sudologin" that accepts any user in the "wheel" group. Or maybe some other more rigid requirement like configuring the first admin user that was created. I'd say ideally any user in wheel would be able to recover. You would need to be able to enter a username at the recovery prompt for this to work, of course. P.S. The recovery prompt is always English (US) only, which is a shame. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org