subject:"bug#52533\: guix deploy breaks SSH access with a PAM error"

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-18 Thread Ludovic Courtès

Hello,

Maxim Cournoyer  skribis:

> Ludovic Courtès  writes:
>
> [...]
>
>>> I'm not sure.  The beauty of Shepherd, in my eyes, when compared to
>>> other init systems, is that it is lean and clean.  Leveraging what's
>>> already out there (and part of GNU) seems an obvious path to me, as it:
>>>
>>> 1. Means less code to write, document and maintain.
>>> 2. Creates more cohesion between various components of the GNU project.
>>
>> Heheh, Guix was started to address #2 actually.  Today, I think #2 is
>> okay but should not be an obstacle.
>
> I personally still think the idea is more than "okay"; I see value in
> it; one of the obvious benefits is documentation; most GNU packages come
> with Texinfo documentation, which makes for a nice, integrated
> experience.  I also think that as the system becomes more established
> and integrate more of GNU, more GNU packages maintainers may be
> interested in joining and contributing (reaching some critical mass).

Heheh.  :-)

>> As for #1, sure, but Shepherd will need to grow a proper event loop
>> anyway, so socket activation won’t make much of a difference.
>
> If we keep it dumb and use inetd, it wouldn't, right?

It will get that, independent of socket activation.

> From what I understand, systemd uses socket activation as a means to
> chain events, while inetd is typically used to delay a service
> starting to save on resources such as RAM (for services seldom used).
> Is my primitive understanding about right?

Yes.  In most cases, it’s about starting services lazily (much like the
Hurd’s passive translators, too.)

Thanks,
Ludo’.

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-17 Thread Maxim Cournoyer

Hi Ludovic!

Ludovic Courtès  writes:

[...]

>> I'm not sure.  The beauty of Shepherd, in my eyes, when compared to
>> other init systems, is that it is lean and clean.  Leveraging what's
>> already out there (and part of GNU) seems an obvious path to me, as it:
>>
>> 1. Means less code to write, document and maintain.
>> 2. Creates more cohesion between various components of the GNU project.
>
> Heheh, Guix was started to address #2 actually.  Today, I think #2 is
> okay but should not be an obstacle.

I personally still think the idea is more than "okay"; I see value in
it; one of the obvious benefits is documentation; most GNU packages come
with Texinfo documentation, which makes for a nice, integrated
experience.  I also think that as the system becomes more established
and integrate more of GNU, more GNU packages maintainers may be
interested in joining and contributing (reaching some critical mass).

> As for #1, sure, but Shepherd will need to grow a proper event loop
> anyway, so socket activation won’t make much of a difference.

If we keep it dumb and use inetd, it wouldn't, right?  From what I
understand, systemd uses socket activation as a means to chain events,
while inetd is typically used to delay a service starting to save on
resources such as RAM (for services seldom used).  Is my primitive
understanding about right?

> Also, taking a step back, systemd undoubtedly changed user expectations
> for the better in terms of integration, monitoring, and logging.  Having
> the same level of integration in the Shepherd would be a step in that
> direction.

At a heavy cost (complexity -- sheer amount of code).  I remember
finding out, for example, that the database-backed, compressed logging
of systemd would consume more disk space than an uncompressed text log
file.  That's because each message has multiple keys associated with
that needs to be written to disk.  It's surprisingly inefficient.

>>> (Basically, it’s a choice we could make right away: do we move all
>>> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
>>> inetd services, or do we instead extend the Shepherd to support socket
>>> activation?  I’m rather in favor of the latter, but if in Guix System we
>>> build an abstraction that can equally well target inetd or a future
>>> Shepherd version, that’s even better.)
>>
>> We could start with just targeting inetd, and build the abstraction
>> later, if the need arises, perhaps?  We may never need it.
>
> Yes, so what I had in mind is, in Guix System, something like
> , which would kinda look like
>  but be lowered (for now) to an inetd service.

This sounds good to me, if you are confident it can fix the problem at
hand.

Thank you,

Maxim

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-17 Thread Ludovic Courtès

Hi,

Maxim Cournoyer  skribis:

> Ludovic Courtès  writes:
>
> [...]
>
>> sshd could also be started via socket activation; ‘sshd’ subprocesses
>> corresponding to existing logins would be unaffected.
>>
>>> Also, it seems to me inetd can already do "socket activation", if this
>>> was somehow useful.
>>
>> Yes, inetd can do that.  It would be nicer though to have it all
>> integrated in the Shepherd.
>
> I'm not sure.  The beauty of Shepherd, in my eyes, when compared to
> other init systems, is that it is lean and clean.  Leveraging what's
> already out there (and part of GNU) seems an obvious path to me, as it:
>
> 1. Means less code to write, document and maintain.
> 2. Creates more cohesion between various components of the GNU project.

Heheh, Guix was started to address #2 actually.  Today, I think #2 is
okay but should not be an obstacle.

As for #1, sure, but Shepherd will need to grow a proper event loop
anyway, so socket activation won’t make much of a difference.

Also, taking a step back, systemd undoubtedly changed user expectations
for the better in terms of integration, monitoring, and logging.  Having
the same level of integration in the Shepherd would be a step in that
direction.

>> (Basically, it’s a choice we could make right away: do we move all
>> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
>> inetd services, or do we instead extend the Shepherd to support socket
>> activation?  I’m rather in favor of the latter, but if in Guix System we
>> build an abstraction that can equally well target inetd or a future
>> Shepherd version, that’s even better.)
>
> We could start with just targeting inetd, and build the abstraction
> later, if the need arises, perhaps?  We may never need it.

Yes, so what I had in mind is, in Guix System, something like
, which would kinda look like
 but be lowered (for now) to an inetd service.

Thanks,
Ludo’.

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-17 Thread Maxim Cournoyer

Hi Ludovic,

Ludovic Courtès  writes:

[...]

> sshd could also be started via socket activation; ‘sshd’ subprocesses
> corresponding to existing logins would be unaffected.
>
>> Also, it seems to me inetd can already do "socket activation", if this
>> was somehow useful.
>
> Yes, inetd can do that.  It would be nicer though to have it all
> integrated in the Shepherd.

I'm not sure.  The beauty of Shepherd, in my eyes, when compared to
other init systems, is that it is lean and clean.  Leveraging what's
already out there (and part of GNU) seems an obvious path to me, as it:

1. Means less code to write, document and maintain.
2. Creates more cohesion between various components of the GNU project.

> (Basically, it’s a choice we could make right away: do we move all
> network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
> inetd services, or do we instead extend the Shepherd to support socket
> activation?  I’m rather in favor of the latter, but if in Guix System we
> build an abstraction that can equally well target inetd or a future
> Shepherd version, that’s even better.)

We could start with just targeting inetd, and build the abstraction
later, if the need arises, perhaps?  We may never need it.

Thanks,

Maxim

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-17 Thread Ludovic Courtès

Hi,

Maxim Cournoyer  skribis:

>>> I was just kicked out of my own server due to this PAM/SSH issue. It
>>> happens quite frequently here. Time for a fix :).
>
> Not a meaningful contribution to the discussion, but my workaround is to
> disable PAM; as it is not enabled in OpenSSH by default, perhaps we
> should also leave it off unless requested?  What are the advantages of
> having it on?

Consistency: authentication had rather work consistently across all
system services that depend on it.

[...]

>> The crux of the problem rather is the global /etc/pam.d: it’s valid for
>> pre-glibc upgrade programs, or for post-glibc upgrade programs, but not
>> both.
>>
>> FHS distros have a similar problem though; how do they handle it?  Do
>> they force services to be restarted when glibc is upgraded, or something
>> along these lines?
>
> I just asked this question in Debian's OFTC channel:
>
> "how does debian handle glibc updates?  are services restarted when it
> happens?  Or does it postpone updating glibc until the next reboot?"
>
> And got for answer: "there is no magic postponing of updates"; the
> external needrestart [0] program was also mentioned.
>
> Researching some more, it seems this may be handled on Debian by the use
> of postinst scripts (which is an arbitrary shell script run after a
> package is installed); so the libc package of Debian for example
> restarts the postgres service to avoid problems:
>
> [0]  https://github.com/liske/needrestart
> [1]  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710275

Yeah.  My recollection is that apt is interactive by default, and it
would typically pop up a dialog telling you that services X and Y need
to be restarted, and asking whether you want to restart them now.

The difference compared to what we have (a message at then telling that
you “may need” to run ‘herd restart X’), the benefit IIRC is that it
tells you which services need to be restarted.

[...]

>> We could maybe sidestep the issue altogether with socket-activated
>> services: they’d be started on-demand, so the second scenario above
>> would be unlikely.  But getting there is quite a bit of work…
>
> I fail to see how this would be a solution for openssh, which would
> typically already be running unless you've never login ounce since the
> machine was up (or am I missing something?).

sshd could also be started via socket activation; ‘sshd’ subprocesses
corresponding to existing logins would be unaffected.

> Also, it seems to me inetd can already do "socket activation", if this
> was somehow useful.

Yes, inetd can do that.  It would be nicer though to have it all
integrated in the Shepherd.

(Basically, it’s a choice we could make right away: do we move all
network daemons, plus things like guix-daemon, dbus-daemon, etc. etc. to
inetd services, or do we instead extend the Shepherd to support socket
activation?  I’m rather in favor of the latter, but if in Guix System we
build an abstraction that can equally well target inetd or a future
Shepherd version, that’s even better.)

Ludo’.

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-13 Thread Maxim Cournoyer

Hello,

Ludovic Courtès  writes:

> Hi,
>
> Mathieu Othacehe  skribis:
>
>>> This sounds a lot like this:
>>>
>>>   https://issues.guix.gnu.org/32182#1
>>
>> I was just kicked out of my own server due to this PAM/SSH issue. It
>> happens quite frequently here. Time for a fix :).

Not a meaningful contribution to the discussion, but my workaround is to
disable PAM; as it is not enabled in OpenSSH by default, perhaps we
should also leave it off unless requested?  What are the advantages of
having it on?

> Note that ‘guix deploy’ now opens a single SSH session, starting from
> 7f20e59a13a6acc3331e04185b8f1ed2538dcd0a, which might help mitigate the
> problem.
>
>> Regarding the two potential solutions that you proposed in 2018, are
>> they still actual? If yes, I could maybe try to implement the second
>> suggestion: introducing service chain-loading.
>
> Service chain-loading was implemented in the Shepherd a few years ago.
> However, it doesn’t really help; consider these two scenario:
>
>   • You do ‘guix system reconfigure && herd restart term-tty1’.  In that
> case, all is good: ‘term-tty1’, will run the new ‘mingetty’ process
> (post-glibc upgrade, thanks to service chain-loading) and ‘login’
> will happily load the .so files listed in /etc/pam.d/login (also
> post-glibc upgrade).
>
>   • You run ‘guix system reconfigure’ but do not restart ‘term-tty1’,
> ‘sshd’, and all the other services that depend on PAM: these
> pre-glibc upgrade programs will try dlopening the post-glibc upgrade
> PAM plugins, which will break.
>
> The crux of the problem rather is the global /etc/pam.d: it’s valid for
> pre-glibc upgrade programs, or for post-glibc upgrade programs, but not
> both.
>
> FHS distros have a similar problem though; how do they handle it?  Do
> they force services to be restarted when glibc is upgraded, or something
> along these lines?

I just asked this question in Debian's OFTC channel:

"how does debian handle glibc updates?  are services restarted when it
happens?  Or does it postpone updating glibc until the next reboot?"

And got for answer: "there is no magic postponing of updates"; the
external needrestart [0] program was also mentioned.

Researching some more, it seems this may be handled on Debian by the use
of postinst scripts (which is an arbitrary shell script run after a
package is installed); so the libc package of Debian for example
restarts the postgres service to avoid problems:

[0]  https://github.com/liske/needrestart
[1]  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=710275

> In our case, suppose libpam honors $PAM_DIRECTORY; we could tweak each
> PAM-using Shepherd service (login, sshd, etc.) so that it sets
> PAM_DIRECTORY… but how would we get the PAM_DIRECTORY value for the OS
> being configured?  Tricky!

Good question, but that seems a good path to pursue; old services would
be using their own old pam modules, allowing them to continue running
unimpacted, while new ones would get the updated pam modules.

> We could maybe sidestep the issue altogether with socket-activated
> services: they’d be started on-demand, so the second scenario above
> would be unlikely.  But getting there is quite a bit of work…

I fail to see how this would be a solution for openssh, which would
typically already be running unless you've never login ounce since the
machine was up (or am I missing something?).  Also, it seems to me inetd
can already do "socket activation", if this was somehow useful.

Thanks,

Maxim

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-13 Thread Ludovic Courtès

Hi,

Mathieu Othacehe  skribis:

>> This sounds a lot like this:
>>
>>   https://issues.guix.gnu.org/32182#1
>
> I was just kicked out of my own server due to this PAM/SSH issue. It
> happens quite frequently here. Time for a fix :).

Note that ‘guix deploy’ now opens a single SSH session, starting from
7f20e59a13a6acc3331e04185b8f1ed2538dcd0a, which might help mitigate the
problem.

> Regarding the two potential solutions that you proposed in 2018, are
> they still actual? If yes, I could maybe try to implement the second
> suggestion: introducing service chain-loading.

Service chain-loading was implemented in the Shepherd a few years ago.
However, it doesn’t really help; consider these two scenario:

  • You do ‘guix system reconfigure && herd restart term-tty1’.  In that
case, all is good: ‘term-tty1’, will run the new ‘mingetty’ process
(post-glibc upgrade, thanks to service chain-loading) and ‘login’
will happily load the .so files listed in /etc/pam.d/login (also
post-glibc upgrade).

  • You run ‘guix system reconfigure’ but do not restart ‘term-tty1’,
‘sshd’, and all the other services that depend on PAM: these
pre-glibc upgrade programs will try dlopening the post-glibc upgrade
PAM plugins, which will break.

The crux of the problem rather is the global /etc/pam.d: it’s valid for
pre-glibc upgrade programs, or for post-glibc upgrade programs, but not
both.

FHS distros have a similar problem though; how do they handle it?  Do
they force services to be restarted when glibc is upgraded, or something
along these lines?

In our case, suppose libpam honors $PAM_DIRECTORY; we could tweak each
PAM-using Shepherd service (login, sshd, etc.) so that it sets
PAM_DIRECTORY… but how would we get the PAM_DIRECTORY value for the OS
being configured?  Tricky!

We could maybe sidestep the issue altogether with socket-activated
services: they’d be started on-demand, so the second scenario above
would be unlikely.  But getting there is quite a bit of work…

Ludo’.

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-13 Thread Mathieu Othacehe



> Regarding the two potential solutions that you proposed in 2018, are
> they still actual? If yes, I could maybe try to implement the second
> suggestion: introducing service chain-loading.

Oh sorry, I stopped reading the thread at
https://issues.guix.gnu.org/32182#1. Looks like the service
chain-loading might not be enough, I'll keep digging.

Thanks,

Mathieu

bug#52533: guix deploy breaks SSH access with a PAM error

2022-01-13 Thread Mathieu Othacehe



Hey,

> This sounds a lot like this:
>
>   https://issues.guix.gnu.org/32182#1

I was just kicked out of my own server due to this PAM/SSH issue. It
happens quite frequently here. Time for a fix :).

Regarding the two potential solutions that you proposed in 2018, are
they still actual? If yes, I could maybe try to implement the second
suggestion: introducing service chain-loading.

Thanks,

Mathieu

bug#52533: guix deploy breaks SSH access with a PAM error

2021-12-16 Thread Ludovic Courtès

Hi,

Maxim Cournoyer  skribis:

> Following the big merge of the core-updates-frozen branch into master,
> I've noticed now on two counts the following: running 'guix deploy'
> leaves the remote machine unreachable by SSH.  The connection passes
> authentication but then gets closed immediately.  /var/log/messages
> reveals the following error:
>
> sshd[29578]:  error: PAM: pam_open_session(): Module is unknown
>
>
> The machines updated were running Guix System revisions predating the
> core-updates-frozen merge.

This sounds a lot like this:

  https://issues.guix.gnu.org/32182#1

WDYT?

Ludo’.

bug#52533: [PATCH] bug#52533: guix deploy breaks SSH access with a PAM error

2021-12-15 Thread Maxim Cournoyer

Hello,

I've found a workaround: disabling PAM for the remote machine
ssh-daemon.  This is not done as part of 'guix deploy', so needs to be
fiddled with manually; I did it this way:

1. take note of the command line and sshd_config file:

--8<---cut here---start->8---
ps -eFww | grep sshd
--8<---cut here---end--->8---

2. Copy the sshd_config file from /gnu/store to somewhere writable and
edit it so tha UsePAM is "no" instead of "yes".

3. Stop the Shepherd service with 'sudo herd stop ssh-daemon'

4. Start the ssh daemon manually (with sudo) by using the command found
in 1. but with the edited config from 2.

Then you should be able to 'guix deploy' successfully.

Reading 'man sshd_config', it says the default for UsePAM is no.
Considering this, and the issue it caused reported here, perhaps we
should disable it by default in Guix?

What do others think?

Thank you,

Maxim

bug#52533: guix deploy breaks SSH access with a PAM error

2021-12-15 Thread Maxim Cournoyer

Hello Guix!

Following the big merge of the core-updates-frozen branch into master,
I've noticed now on two counts the following: running 'guix deploy'
leaves the remote machine unreachable by SSH.  The connection passes
authentication but then gets closed immediately.  /var/log/messages
reveals the following error:

--8<---cut here---start->8---
sshd[29578]:  error: PAM: pam_open_session(): Module is unknown
--8<---cut here---end--->8---

The machines updated were running Guix System revisions predating the
core-updates-frozen merge.

The 'guix deploy' command doesn't succeed due to SSH starting to fail at
99% completion or similar; the bootloader configuration is not updated
so rebooting boots into the same old system generation (and SSH works
again):

--8<---cut here---start->8---
guix deploy: deploying to x200...
guix deploy: sending 0 store items (0 MiB) to 'x200.local'...
guix deploy: sending 0 store items (0 MiB) to 'x200.local'...
substitute: updating substitutes from 'http://127.0.0.1:8181'... 100.0%
substitute: updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
The following derivations will be built:
   /gnu/store/049wr939gjpgl3471wrk8b1waqgswrdi-remote-exp.scm.drv
   /gnu/store/y1mgddpa2qkrmc01knpdam917b60yxlq-switch-to-system.scm.drv
   /gnu/store/vgadszcfklbhr7d8yl8jprzipjy6b0vj-system.drv
   /gnu/store/ypyaf6ib1w5nc4kr0xgjm4par407cnzk-provenance.drv

building /gnu/store/ypyaf6ib1w5nc4kr0xgjm4par407cnzk-provenance.drv...
building /gnu/store/vgadszcfklbhr7d8yl8jprzipjy6b0vj-system.drv...
building /gnu/store/y1mgddpa2qkrmc01knpdam917b60yxlq-switch-to-system.scm.drv...
building /gnu/store/049wr939gjpgl3471wrk8b1waqgswrdi-remote-exp.scm.drv...
guix deploy: sending 5 store items (0 MiB) to 'x200.local'...
guix deploy: error: failed to deploy x200: failed to start 'guix repl' on 
'x200.local'

$ guix deploy ~/stow/guix/machines/x200.scm --no-offload
The following 1 machine will be deployed:
  x200

guix deploy: deploying to x200...
guix deploy: error: failed to deploy x200: remote command
'/run/setuid-programs/sudo -n -- guix repl -t machine' failed with
status 254

$ ssh x200
Last login: Wed Dec 15 23:28:02 2021 from 192.168.10.15
Connection to x200.local closed.
--8<---cut here---end--->8---

This is obviously embarrassing in scenarios where the SSH connection is
the main way to reach to the remote machine.

Ideas?

Thank you,

Maxim

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: [PATCH] bug#52533: guix deploy breaks SSH access with a PAM error

bug#52533: guix deploy breaks SSH access with a PAM error

12 matches

Site Navigation

Mail list logo

Footer information