bug#52533: guix deploy breaks SSH access with a PAM error
Hello, Maxim Cournoyer skribis: > Ludovic Courtès writes: > > [...] > >>> I'm not sure. The beauty of Shepherd, in my eyes, when compared to >>> other init systems, is that it is lean and clean. Leveraging what's >>> already out there (and part of GNU) seems an obvious path to me, as it: >>> >>> 1. Means less code to write, document and maintain. >>> 2. Creates more cohesion between various components of the GNU project. >> >> Heheh, Guix was started to address #2 actually. Today, I think #2 is >> okay but should not be an obstacle. > > I personally still think the idea is more than "okay"; I see value in > it; one of the obvious benefits is documentation; most GNU packages come > with Texinfo documentation, which makes for a nice, integrated > experience. I also think that as the system becomes more established > and integrate more of GNU, more GNU packages maintainers may be > interested in joining and contributing (reaching some critical mass). Heheh. :-) >> As for #1, sure, but Shepherd will need to grow a proper event loop >> anyway, so socket activation won’t make much of a difference. > > If we keep it dumb and use inetd, it wouldn't, right? It will get that, independent of socket activation. > From what I understand, systemd uses socket activation as a means to > chain events, while inetd is typically used to delay a service > starting to save on resources such as RAM (for services seldom used). > Is my primitive understanding about right? Yes. In most cases, it’s about starting services lazily (much like the Hurd’s passive translators, too.) Thanks, Ludo’.
bug#52533: guix deploy breaks SSH access with a PAM error
Hi Ludovic! Ludovic Courtès writes: [...] >> I'm not sure. The beauty of Shepherd, in my eyes, when compared to >> other init systems, is that it is lean and clean. Leveraging what's >> already out there (and part of GNU) seems an obvious path to me, as it: >> >> 1. Means less code to write, document and maintain. >> 2. Creates more cohesion between various components of the GNU project. > > Heheh, Guix was started to address #2 actually. Today, I think #2 is > okay but should not be an obstacle. I personally still think the idea is more than "okay"; I see value in it; one of the obvious benefits is documentation; most GNU packages come with Texinfo documentation, which makes for a nice, integrated experience. I also think that as the system becomes more established and integrate more of GNU, more GNU packages maintainers may be interested in joining and contributing (reaching some critical mass). > As for #1, sure, but Shepherd will need to grow a proper event loop > anyway, so socket activation won’t make much of a difference. If we keep it dumb and use inetd, it wouldn't, right? From what I understand, systemd uses socket activation as a means to chain events, while inetd is typically used to delay a service starting to save on resources such as RAM (for services seldom used). Is my primitive understanding about right? > Also, taking a step back, systemd undoubtedly changed user expectations > for the better in terms of integration, monitoring, and logging. Having > the same level of integration in the Shepherd would be a step in that > direction. At a heavy cost (complexity -- sheer amount of code). I remember finding out, for example, that the database-backed, compressed logging of systemd would consume more disk space than an uncompressed text log file. That's because each message has multiple keys associated with that needs to be written to disk. It's surprisingly inefficient. >>> (Basically, it’s a choice we could make right away: do we move all >>> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to >>> inetd services, or do we instead extend the Shepherd to support socket >>> activation? I’m rather in favor of the latter, but if in Guix System we >>> build an abstraction that can equally well target inetd or a future >>> Shepherd version, that’s even better.) >> >> We could start with just targeting inetd, and build the abstraction >> later, if the need arises, perhaps? We may never need it. > > Yes, so what I had in mind is, in Guix System, something like > , which would kinda look like > but be lowered (for now) to an inetd service. This sounds good to me, if you are confident it can fix the problem at hand. Thank you, Maxim
bug#52533: guix deploy breaks SSH access with a PAM error
Hi, Maxim Cournoyer skribis: > Ludovic Courtès writes: > > [...] > >> sshd could also be started via socket activation; ‘sshd’ subprocesses >> corresponding to existing logins would be unaffected. >> >>> Also, it seems to me inetd can already do "socket activation", if this >>> was somehow useful. >> >> Yes, inetd can do that. It would be nicer though to have it all >> integrated in the Shepherd. > > I'm not sure. The beauty of Shepherd, in my eyes, when compared to > other init systems, is that it is lean and clean. Leveraging what's > already out there (and part of GNU) seems an obvious path to me, as it: > > 1. Means less code to write, document and maintain. > 2. Creates more cohesion between various components of the GNU project. Heheh, Guix was started to address #2 actually. Today, I think #2 is okay but should not be an obstacle. As for #1, sure, but Shepherd will need to grow a proper event loop anyway, so socket activation won’t make much of a difference. Also, taking a step back, systemd undoubtedly changed user expectations for the better in terms of integration, monitoring, and logging. Having the same level of integration in the Shepherd would be a step in that direction. >> (Basically, it’s a choice we could make right away: do we move all >> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to >> inetd services, or do we instead extend the Shepherd to support socket >> activation? I’m rather in favor of the latter, but if in Guix System we >> build an abstraction that can equally well target inetd or a future >> Shepherd version, that’s even better.) > > We could start with just targeting inetd, and build the abstraction > later, if the need arises, perhaps? We may never need it. Yes, so what I had in mind is, in Guix System, something like , which would kinda look like but be lowered (for now) to an inetd service. Thanks, Ludo’.
bug#52533: guix deploy breaks SSH access with a PAM error
Hi Ludovic, Ludovic Courtès writes: [...] > sshd could also be started via socket activation; ‘sshd’ subprocesses > corresponding to existing logins would be unaffected. > >> Also, it seems to me inetd can already do "socket activation", if this >> was somehow useful. > > Yes, inetd can do that. It would be nicer though to have it all > integrated in the Shepherd. I'm not sure. The beauty of Shepherd, in my eyes, when compared to other init systems, is that it is lean and clean. Leveraging what's already out there (and part of GNU) seems an obvious path to me, as it: 1. Means less code to write, document and maintain. 2. Creates more cohesion between various components of the GNU project. > (Basically, it’s a choice we could make right away: do we move all > network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to > inetd services, or do we instead extend the Shepherd to support socket > activation? I’m rather in favor of the latter, but if in Guix System we > build an abstraction that can equally well target inetd or a future > Shepherd version, that’s even better.) We could start with just targeting inetd, and build the abstraction later, if the need arises, perhaps? We may never need it. Thanks, Maxim
bug#52533: guix deploy breaks SSH access with a PAM error
Hi, Maxim Cournoyer skribis: >>> I was just kicked out of my own server due to this PAM/SSH issue. It >>> happens quite frequently here. Time for a fix :). > > Not a meaningful contribution to the discussion, but my workaround is to > disable PAM; as it is not enabled in OpenSSH by default, perhaps we > should also leave it off unless requested? What are the advantages of > having it on? Consistency: authentication had rather work consistently across all system services that depend on it. [...] >> The crux of the problem rather is the global /etc/pam.d: it’s valid for >> pre-glibc upgrade programs, or for post-glibc upgrade programs, but not >> both. >> >> FHS distros have a similar problem though; how do they handle it? Do >> they force services to be restarted when glibc is upgraded, or something >> along these lines? > > I just asked this question in Debian's OFTC channel: > > "how does debian handle glibc updates? are services restarted when it > happens? Or does it postpone updating glibc until the next reboot?" > > And got for answer: "there is no magic postponing of updates"; the > external needrestart [0] program was also mentioned. > > Researching some more, it seems this may be handled on Debian by the use > of postinst scripts (which is an arbitrary shell script run after a > package is installed); so the libc package of Debian for example > restarts the postgres service to avoid problems: > > [0] https://github.com/liske/needrestart > [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710275 Yeah. My recollection is that apt is interactive by default, and it would typically pop up a dialog telling you that services X and Y need to be restarted, and asking whether you want to restart them now. The difference compared to what we have (a message at then telling that you “may need” to run ‘herd restart X’), the benefit IIRC is that it tells you which services need to be restarted. [...] >> We could maybe sidestep the issue altogether with socket-activated >> services: they’d be started on-demand, so the second scenario above >> would be unlikely. But getting there is quite a bit of work… > > I fail to see how this would be a solution for openssh, which would > typically already be running unless you've never login ounce since the > machine was up (or am I missing something?). sshd could also be started via socket activation; ‘sshd’ subprocesses corresponding to existing logins would be unaffected. > Also, it seems to me inetd can already do "socket activation", if this > was somehow useful. Yes, inetd can do that. It would be nicer though to have it all integrated in the Shepherd. (Basically, it’s a choice we could make right away: do we move all network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to inetd services, or do we instead extend the Shepherd to support socket activation? I’m rather in favor of the latter, but if in Guix System we build an abstraction that can equally well target inetd or a future Shepherd version, that’s even better.) Ludo’.
bug#52533: guix deploy breaks SSH access with a PAM error
Hello, Ludovic Courtès writes: > Hi, > > Mathieu Othacehe skribis: > >>> This sounds a lot like this: >>> >>> https://issues.guix.gnu.org/32182#1 >> >> I was just kicked out of my own server due to this PAM/SSH issue. It >> happens quite frequently here. Time for a fix :). Not a meaningful contribution to the discussion, but my workaround is to disable PAM; as it is not enabled in OpenSSH by default, perhaps we should also leave it off unless requested? What are the advantages of having it on? > Note that ‘guix deploy’ now opens a single SSH session, starting from > 7f20e59a13a6acc3331e04185b8f1ed2538dcd0a, which might help mitigate the > problem. > >> Regarding the two potential solutions that you proposed in 2018, are >> they still actual? If yes, I could maybe try to implement the second >> suggestion: introducing service chain-loading. > > Service chain-loading was implemented in the Shepherd a few years ago. > However, it doesn’t really help; consider these two scenario: > > • You do ‘guix system reconfigure && herd restart term-tty1’. In that > case, all is good: ‘term-tty1’, will run the new ‘mingetty’ process > (post-glibc upgrade, thanks to service chain-loading) and ‘login’ > will happily load the .so files listed in /etc/pam.d/login (also > post-glibc upgrade). > > • You run ‘guix system reconfigure’ but do not restart ‘term-tty1’, > ‘sshd’, and all the other services that depend on PAM: these > pre-glibc upgrade programs will try dlopening the post-glibc upgrade > PAM plugins, which will break. > > The crux of the problem rather is the global /etc/pam.d: it’s valid for > pre-glibc upgrade programs, or for post-glibc upgrade programs, but not > both. > > FHS distros have a similar problem though; how do they handle it? Do > they force services to be restarted when glibc is upgraded, or something > along these lines? I just asked this question in Debian's OFTC channel: "how does debian handle glibc updates? are services restarted when it happens? Or does it postpone updating glibc until the next reboot?" And got for answer: "there is no magic postponing of updates"; the external needrestart [0] program was also mentioned. Researching some more, it seems this may be handled on Debian by the use of postinst scripts (which is an arbitrary shell script run after a package is installed); so the libc package of Debian for example restarts the postgres service to avoid problems: [0] https://github.com/liske/needrestart [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710275 > In our case, suppose libpam honors $PAM_DIRECTORY; we could tweak each > PAM-using Shepherd service (login, sshd, etc.) so that it sets > PAM_DIRECTORY… but how would we get the PAM_DIRECTORY value for the OS > being configured? Tricky! Good question, but that seems a good path to pursue; old services would be using their own old pam modules, allowing them to continue running unimpacted, while new ones would get the updated pam modules. > We could maybe sidestep the issue altogether with socket-activated > services: they’d be started on-demand, so the second scenario above > would be unlikely. But getting there is quite a bit of work… I fail to see how this would be a solution for openssh, which would typically already be running unless you've never login ounce since the machine was up (or am I missing something?). Also, it seems to me inetd can already do "socket activation", if this was somehow useful. Thanks, Maxim
bug#52533: guix deploy breaks SSH access with a PAM error
Hi, Mathieu Othacehe skribis: >> This sounds a lot like this: >> >> https://issues.guix.gnu.org/32182#1 > > I was just kicked out of my own server due to this PAM/SSH issue. It > happens quite frequently here. Time for a fix :). Note that ‘guix deploy’ now opens a single SSH session, starting from 7f20e59a13a6acc3331e04185b8f1ed2538dcd0a, which might help mitigate the problem. > Regarding the two potential solutions that you proposed in 2018, are > they still actual? If yes, I could maybe try to implement the second > suggestion: introducing service chain-loading. Service chain-loading was implemented in the Shepherd a few years ago. However, it doesn’t really help; consider these two scenario: • You do ‘guix system reconfigure && herd restart term-tty1’. In that case, all is good: ‘term-tty1’, will run the new ‘mingetty’ process (post-glibc upgrade, thanks to service chain-loading) and ‘login’ will happily load the .so files listed in /etc/pam.d/login (also post-glibc upgrade). • You run ‘guix system reconfigure’ but do not restart ‘term-tty1’, ‘sshd’, and all the other services that depend on PAM: these pre-glibc upgrade programs will try dlopening the post-glibc upgrade PAM plugins, which will break. The crux of the problem rather is the global /etc/pam.d: it’s valid for pre-glibc upgrade programs, or for post-glibc upgrade programs, but not both. FHS distros have a similar problem though; how do they handle it? Do they force services to be restarted when glibc is upgraded, or something along these lines? In our case, suppose libpam honors $PAM_DIRECTORY; we could tweak each PAM-using Shepherd service (login, sshd, etc.) so that it sets PAM_DIRECTORY… but how would we get the PAM_DIRECTORY value for the OS being configured? Tricky! We could maybe sidestep the issue altogether with socket-activated services: they’d be started on-demand, so the second scenario above would be unlikely. But getting there is quite a bit of work… Ludo’.
bug#52533: guix deploy breaks SSH access with a PAM error
> Regarding the two potential solutions that you proposed in 2018, are > they still actual? If yes, I could maybe try to implement the second > suggestion: introducing service chain-loading. Oh sorry, I stopped reading the thread at https://issues.guix.gnu.org/32182#1. Looks like the service chain-loading might not be enough, I'll keep digging. Thanks, Mathieu
bug#52533: guix deploy breaks SSH access with a PAM error
Hey, > This sounds a lot like this: > > https://issues.guix.gnu.org/32182#1 I was just kicked out of my own server due to this PAM/SSH issue. It happens quite frequently here. Time for a fix :). Regarding the two potential solutions that you proposed in 2018, are they still actual? If yes, I could maybe try to implement the second suggestion: introducing service chain-loading. Thanks, Mathieu
bug#52533: guix deploy breaks SSH access with a PAM error
Hi, Maxim Cournoyer skribis: > Following the big merge of the core-updates-frozen branch into master, > I've noticed now on two counts the following: running 'guix deploy' > leaves the remote machine unreachable by SSH. The connection passes > authentication but then gets closed immediately. /var/log/messages > reveals the following error: > > sshd[29578]: error: PAM: pam_open_session(): Module is unknown > > > The machines updated were running Guix System revisions predating the > core-updates-frozen merge. This sounds a lot like this: https://issues.guix.gnu.org/32182#1 WDYT? Ludo’.
bug#52533: [PATCH] bug#52533: guix deploy breaks SSH access with a PAM error
Hello, I've found a workaround: disabling PAM for the remote machine ssh-daemon. This is not done as part of 'guix deploy', so needs to be fiddled with manually; I did it this way: 1. take note of the command line and sshd_config file: --8<---cut here---start->8--- ps -eFww | grep sshd --8<---cut here---end--->8--- 2. Copy the sshd_config file from /gnu/store to somewhere writable and edit it so tha UsePAM is "no" instead of "yes". 3. Stop the Shepherd service with 'sudo herd stop ssh-daemon' 4. Start the ssh daemon manually (with sudo) by using the command found in 1. but with the edited config from 2. Then you should be able to 'guix deploy' successfully. Reading 'man sshd_config', it says the default for UsePAM is no. Considering this, and the issue it caused reported here, perhaps we should disable it by default in Guix? What do others think? Thank you, Maxim
bug#52533: guix deploy breaks SSH access with a PAM error
Hello Guix! Following the big merge of the core-updates-frozen branch into master, I've noticed now on two counts the following: running 'guix deploy' leaves the remote machine unreachable by SSH. The connection passes authentication but then gets closed immediately. /var/log/messages reveals the following error: --8<---cut here---start->8--- sshd[29578]: error: PAM: pam_open_session(): Module is unknown --8<---cut here---end--->8--- The machines updated were running Guix System revisions predating the core-updates-frozen merge. The 'guix deploy' command doesn't succeed due to SSH starting to fail at 99% completion or similar; the bootloader configuration is not updated so rebooting boots into the same old system generation (and SSH works again): --8<---cut here---start->8--- guix deploy: deploying to x200... guix deploy: sending 0 store items (0 MiB) to 'x200.local'... guix deploy: sending 0 store items (0 MiB) to 'x200.local'... substitute: updating substitutes from 'http://127.0.0.1:8181'... 100.0% substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0% The following derivations will be built: /gnu/store/049wr939gjpgl3471wrk8b1waqgswrdi-remote-exp.scm.drv /gnu/store/y1mgddpa2qkrmc01knpdam917b60yxlq-switch-to-system.scm.drv /gnu/store/vgadszcfklbhr7d8yl8jprzipjy6b0vj-system.drv /gnu/store/ypyaf6ib1w5nc4kr0xgjm4par407cnzk-provenance.drv building /gnu/store/ypyaf6ib1w5nc4kr0xgjm4par407cnzk-provenance.drv... building /gnu/store/vgadszcfklbhr7d8yl8jprzipjy6b0vj-system.drv... building /gnu/store/y1mgddpa2qkrmc01knpdam917b60yxlq-switch-to-system.scm.drv... building /gnu/store/049wr939gjpgl3471wrk8b1waqgswrdi-remote-exp.scm.drv... guix deploy: sending 5 store items (0 MiB) to 'x200.local'... guix deploy: error: failed to deploy x200: failed to start 'guix repl' on 'x200.local' $ guix deploy ~/stow/guix/machines/x200.scm --no-offload The following 1 machine will be deployed: x200 guix deploy: deploying to x200... guix deploy: error: failed to deploy x200: remote command '/run/setuid-programs/sudo -n -- guix repl -t machine' failed with status 254 $ ssh x200 Last login: Wed Dec 15 23:28:02 2021 from 192.168.10.15 Connection to x200.local closed. --8<---cut here---end--->8--- This is obviously embarrassing in scenarios where the SSH connection is the main way to reach to the remote machine. Ideas? Thank you, Maxim