Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-18 Thread Chris Hofstaedtler
* Thomas Goirand  [210218 17:03]:
> On 2/18/21 12:59 PM, Timo Weingärtner wrote:
> > 17.02.21 21:42 Chris Hofstaedtler:
> >> Services that use After=network-online.target are generally broken,
> >> please do not introduce that.
> > 
> > Seconded. Just consider a node where one link is down on boot and you would 
> > have to wait such a long time until you can examine the problem via ssh.
> 
> I still don't understand this part. If network isn't online, how can I
> ssh to my server anyways?

Thats exactly the problem description: everyone and every service
has a different idea of "network is online".

I'm also happy with "fe80::x" IPv6 is ready - for ssh. Don't need to
wait for IPv4 DHCP or a default gateway or whatever.

Chris



Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-18 Thread Thomas Goirand
On 2/18/21 12:59 PM, Timo Weingärtner wrote:
> 17.02.21 21:42 Chris Hofstaedtler:
>> Services that use After=network-online.target are generally broken,
>> please do not introduce that.
> 
> Seconded. Just consider a node where one link is down on boot and you would 
> have to wait such a long time until you can examine the problem via ssh.

I still don't understand this part. If network isn't online, how can I
ssh to my server anyways?

Cheers,

Thomas Goirand (zigo)



Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-18 Thread Timo Weingärtner
Hallo,

17.02.21 21:42 Chris Hofstaedtler:
> * Thomas Goirand  [210217 20:38]:
> > # cat /etc/systemd/system/ssh.service.d/override.conf
> > [Unit]
> > After=network-online.target auditd.service
> > 
> > But IMO, this is very wrong to mandate doing this, and not having ssh
> > connectivity after a reboot, is kind of a grave problem.
> > 
> > So, could you hard-wire this in the openssh-server package directly, so
> > Debian users can avoid such an override? Indeed After=network.target
> > doesn't tell you that network is ready. After=network-online.target does,
> > and that's IMO what the ssh daemon should be using.
> 
> But if you do this, you'll end up delaying start of sshd for up to
> 120seconds in error cases. And even then, you might not get what you
> want (if you read systemd-networkd-wait-online.service(8)
> carefully).
> 
> Services that use After=network-online.target are generally broken,
> please do not introduce that.

Seconded. Just consider a node where one link is down on boot and you would 
have to wait such a long time until you can examine the problem via ssh.

> As discussed already, IP_FREEBIND is a thing. The system-wide sysctl
> is a common workaround, especially for "bgp-on-the-host" setups, for
> all sorts of servers/daemons.

That should work; systemd-sysctl.service is ordered before ssh.

Another option is in #965132 (ssh@.socket), but then the fix for #946180 and 
#934663 (RuntimeDirectoryPreserve=yes for ssh*.service) is also needed.


Grüße
Timo

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=965132
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=946180
[3] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=934663

signature.asc
Description: This is a digitally signed message part.


Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-17 Thread Thomas Goirand
Hi Colin,

Thanks a lot for taking the time to provide very valuable information.
It's helping me a lot.

On 2/17/21 12:11 PM, Colin Watson wrote:
> On Wed, Feb 17, 2021 at 11:46:57AM +0100, Thomas Goirand wrote:
>> On 2/17/21 10:14 AM, Colin Watson wrote:
>>> Are you using ListenAddress in sshd_config?
>>
>> Yes, with the same IP as above, in order to make sure ssh isn't
>> listening on a public IP (which would be a security concern for us).
> 
> Oh, that's vital information for this bug

I'm surprised ...

> using ListenAddress changes
> the constraints on sshd startup, somewhat as described in README.Debian.

I've read it, and the only part of the README.Debian that talks about
something related, is about the removal of the if-up hook. I don't see
any startup constraint changes described there. What did I miss?

The launchpad bug entry about the ifupdown hook removal is specifically
discussing the fact that in my case, I'd be affected. Indeed, I am. I
then wonder what's advised then...

> In that case I think this is at least arguably a case of needing to keep
> your configuration in sync, isn't it?  You've made a change to
> sshd_config, so you need to change other parts of the system to support
> that change.

I'm not convinced that using a custom ListenAddress implies repairing
the boot process, no.

By default, sshd_config has this:

#Port 22
#AddressFamily any
#ListenAddress 0.0.0.0
#ListenAddress ::

Having these commented out is an invitation to un-comment and use them
with custom values. Basically, what you wrote above is that doing this
breaks sshd. Hopefully, you'll agree that this isn't what one would
expect! :)

If that's really the case and one should do either the systemd ordering
hack I'm doing, or the net.ipv4.ip_nonlocal_bind sysctl tweak, then IMO
it'd be worse either:

1/ removing ListenAddress from the example sshd_config
2/ adding comments just above the directive, explaining what we're
discussing in this bug entry.

>> Maybe setting-up net.ipv4.ip_nonlocal_bind=1 (in sysctl.conf) would fix
>> my issue, no?
> 
> That's the system-wide version of IP_FREEBIND.  OpenSSH upstream seems
> to have decided not to support IP_FREEBIND
> (https://bugzilla.mindrot.org/show_bug.cgi?id=2512)

If I understand well what's in this bug entry, upstream seems to suggest
to do what I did:
1/ ListenAddress
2/ Add an override so that sshd starts After=network-online.target

However, it's looking like you're saying one shouldn't do 2/ at all?
Could you explain why? I've been using After=network-online.target in
most daemons I maintain, and now I'm wondering if that's wrong...
Then if I shouldn't do After=network-online.target, do you believe that
the sysctl hack is better?

> but the sysctl should work if you're OK with it being system-wide.

I'd very much prefer if this could be a per-socket thing, but I already
do this because that's how I setup haproxy that binds on a VIP (which
can move from one host to another). Though now, I wonder if there's an
option in HAProxy so it could use IP_FREEBIND on its own. Which would
then lead me to say I would prefer to not use a system-wide thing...

> I'd also recommend at least considering other approaches to implementing
> your security policy that avoid the ordering complexities of
> ListenAddress, since there are other ways to prevent incoming
> connections on public IP addresses.  Approaches I can think of include:
> 
>  * Reject connections to port 22 at the firewall level (perhaps a
>firewall on the local host).

Considering the interaction with OpenStack Neutron, this could
potentially be hard to maintain: Neutron is doing a lot of iptables
stuff on its own, and I prefer if I don't touch that at all, either on
compute nodes or network nodes. So the option to use ListenAddress
looked a way nicer for my use case.

>  * It might be worth experimenting with Match LocalAddress in
>sshd_config.  I haven't played with that much myself, and it's
>poorly-documented, but I *think* that might allow you to reject any
>connections that aren't directed to appropriate addresses.

>From my experience, it's best to not expose the ssh port at all if
possible, as brute-forcing the port may lead brokenness.

On 2/17/21 9:42 PM, Chris Hofstaedtler wrote:
> But if you do this, you'll end up delaying start of sshd for up to
> 120seconds in error cases. And even then, you might not get what you
> want (if you read systemd-networkd-wait-online.service(8)
> carefully).

This talks about networkd. Unless things have changed, I don't think
Debian is using this by default (yet?).

> Services that use After=network-online.target are generally broken,
> please do not introduce that.

Can you explain why?

> As discussed already, IP_FREEBIND is a thing.

As per the bug entry Collin pointed out, it looks like it isn't a thing
in sshd at least...

> The system-wide sysctl
> is a common workaround, especially for "bgp-on-the-host" setups, for
> all sorts of servers/daemons.


Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-17 Thread Chris Hofstaedtler
* Thomas Goirand  [210217 20:38]:
> # cat /etc/systemd/system/ssh.service.d/override.conf 
> [Unit]
> After=network-online.target auditd.service
> 
> But IMO, this is very wrong to mandate doing this, and not having ssh
> connectivity after a reboot, is kind of a grave problem.
> 
> So, could you hard-wire this in the openssh-server package directly, so Debian
> users can avoid such an override? Indeed After=network.target doesn't tell you
> that network is ready. After=network-online.target does, and that's IMO what
> the ssh daemon should be using.

But if you do this, you'll end up delaying start of sshd for up to
120seconds in error cases. And even then, you might not get what you
want (if you read systemd-networkd-wait-online.service(8)
carefully).

Services that use After=network-online.target are generally broken,
please do not introduce that.

As discussed already, IP_FREEBIND is a thing. The system-wide sysctl
is a common workaround, especially for "bgp-on-the-host" setups, for
all sorts of servers/daemons.

Chris



Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-17 Thread Colin Watson
On Wed, Feb 17, 2021 at 11:46:57AM +0100, Thomas Goirand wrote:
> On 2/17/21 10:14 AM, Colin Watson wrote:
> > On Wed, Feb 17, 2021 at 09:36:15AM +0100, Thomas Goirand wrote:
> >> This means that, until FRR is fully up and running, with the BGP session
> >> established, the server IP (10.x.x.x/32 bound to the loopback interface) 
> >> isn't
> >> set yet on the server, and the ssh daemon cannot bind on the IP (as it's 
> >> not
> >> there yet).
> > 
> > Are you using ListenAddress in sshd_config?
> 
> Yes, with the same IP as above, in order to make sure ssh isn't
> listening on a public IP (which would be a security concern for us).

Oh, that's vital information for this bug - using ListenAddress changes
the constraints on sshd startup, somewhat as described in README.Debian.
In that case I think this is at least arguably a case of needing to keep
your configuration in sync, isn't it?  You've made a change to
sshd_config, so you need to change other parts of the system to support
that change.

I'd be happy to try to clarify documentation once we work out what
works.

> > See also
> > https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/, which
> > among other things (in general the tone of that page is that a
> > well-written service should not use After=network-online.target) says:
> > 
> >   "If you write a server: listen on [::], [::1], 0.0.0.0 and 127.0.0.1
> >   only.  These pseudo-addresses are unconditionally available."
> > 
> > That's what sshd does in its default configuration.  If it doesn't work,
> > the systemd documentation suggests that something else is not fulfilling
> > its end of a contract somewhere.
> 
> Maybe setting-up net.ipv4.ip_nonlocal_bind=1 (in sysctl.conf) would fix
> my issue, no?

That's the system-wide version of IP_FREEBIND.  OpenSSH upstream seems
to have decided not to support IP_FREEBIND
(https://bugzilla.mindrot.org/show_bug.cgi?id=2512), but the sysctl
should work if you're OK with it being system-wide.

I'd also recommend at least considering other approaches to implementing
your security policy that avoid the ordering complexities of
ListenAddress, since there are other ways to prevent incoming
connections on public IP addresses.  Approaches I can think of include:

 * Reject connections to port 22 at the firewall level (perhaps a
   firewall on the local host).

 * It might be worth experimenting with Match LocalAddress in
   sshd_config.  I haven't played with that much myself, and it's
   poorly-documented, but I *think* that might allow you to reject any
   connections that aren't directed to appropriate addresses.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-17 Thread Thomas Goirand
On 2/17/21 10:14 AM, Colin Watson wrote:
> Control: severity -1 important
> 
> On Wed, Feb 17, 2021 at 09:36:15AM +0100, Thomas Goirand wrote:
>> Package: openssh-server
>> Version: 1:8.4p1-4
>> Severity: grave
> 
> No.  It may yet need to be sorted out before release, but this is a rare
> situation and I'm not having it being release-critical right now,
> especially because it's not clear that it's openssh-server's problem.

Let's not discuss the severity: let's try to fix the issue instead.

>> On a Sid/Testing system, currently we have in 
>> /lib/systemd/system/ssh.service:
>>
>> After=network.target auditd.service
>>
>> While this isn't a problem in most installation, it didn't work under our 
>> setup,
>> because we use "bgp-to-the-host" networking. In this setup, we need FRR (the
>> BGP routing daemon which is a fork of Quagga, if you didn't know) to provide
>> network connectivity to the server. Our configuration is something like this:
>>
>> # cat /etc/frr/frr.conf
>> [...]
>> !
>> int lo
>>  ip address 10.56.17.7/32
>> !
>> [...]
>>
>> This means that, until FRR is fully up and running, with the BGP session
>> established, the server IP (10.x.x.x/32 bound to the loopback interface) 
>> isn't
>> set yet on the server, and the ssh daemon cannot bind on the IP (as it's not
>> there yet).
> 
> Are you using ListenAddress in sshd_config?

Yes, with the same IP as above, in order to make sure ssh isn't
listening on a public IP (which would be a security concern for us).

> Normally sshd doesn't
> require network interfaces to be online - it just binds to 0.0.0.0 or
> [::] and that's enough for it to be bound to interfaces as they come up.
> 
> If lo has to be up for this to work (which is surprising to me, but
> maybe), then I think there's a decent argument for frr to be part of
> network.target on such systems.

The interface is up before FRR start. Though the IP on localhost is only
added when FRR has established a working BGP session with its peers
(here, the 2 switches the machine is connected to).

> See also
> https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/, which
> among other things (in general the tone of that page is that a
> well-written service should not use After=network-online.target) says:
> 
>   "If you write a server: listen on [::], [::1], 0.0.0.0 and 127.0.0.1
>   only.  These pseudo-addresses are unconditionally available."
> 
> That's what sshd does in its default configuration.  If it doesn't work,
> the systemd documentation suggests that something else is not fulfilling
> its end of a contract somewhere.

Maybe setting-up net.ipv4.ip_nonlocal_bind=1 (in sysctl.conf) would fix
my issue, no?

Your thoughts?
Cheers,

Thomas Goirand (zigo)



Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-17 Thread Colin Watson
Control: severity -1 important

On Wed, Feb 17, 2021 at 09:36:15AM +0100, Thomas Goirand wrote:
> Package: openssh-server
> Version: 1:8.4p1-4
> Severity: grave

No.  It may yet need to be sorted out before release, but this is a rare
situation and I'm not having it being release-critical right now,
especially because it's not clear that it's openssh-server's problem.

> On a Sid/Testing system, currently we have in /lib/systemd/system/ssh.service:
> 
> After=network.target auditd.service
> 
> While this isn't a problem in most installation, it didn't work under our 
> setup,
> because we use "bgp-to-the-host" networking. In this setup, we need FRR (the
> BGP routing daemon which is a fork of Quagga, if you didn't know) to provide
> network connectivity to the server. Our configuration is something like this:
> 
> # cat /etc/frr/frr.conf
> [...]
> !
> int lo
>  ip address 10.56.17.7/32
> !
> [...]
> 
> This means that, until FRR is fully up and running, with the BGP session
> established, the server IP (10.x.x.x/32 bound to the loopback interface) isn't
> set yet on the server, and the ssh daemon cannot bind on the IP (as it's not
> there yet).

Are you using ListenAddress in sshd_config?  Normally sshd doesn't
require network interfaces to be online - it just binds to 0.0.0.0 or
[::] and that's enough for it to be bound to interfaces as they come up.

If lo has to be up for this to work (which is surprising to me, but
maybe), then I think there's a decent argument for frr to be part of
network.target on such systems.

> Our fix was pretty simple:
> 
> # cat /etc/systemd/system/ssh.service.d/override.conf 
> [Unit]
> After=network-online.target auditd.service
> 
> But IMO, this is very wrong to mandate doing this, and not having ssh
> connectivity after a reboot, is kind of a grave problem.
> 
> So, could you hard-wire this in the openssh-server package directly, so Debian
> users can avoid such an override? Indeed After=network.target doesn't tell you
> that network is ready. After=network-online.target does, and that's IMO what
> the ssh daemon should be using.

I don't agree that network-online.target is appropriate in general, from
its documentation:

   network-online.target
   Units that strictly require a configured network connection
   should pull in network-online.target (via a Wants= type
   dependency) and order themselves after it. This target unit
   is intended to pull in a service that delays further
   execution until the network is sufficiently set up. What
   precisely this requires is left to the implementation of
   the network managing service.

   Note the distinction between this unit and network.target.
   This unit is an active unit (i.e. pulled in by the consumer
   rather than the provider of this functionality) and pulls
   in a service which possibly adds substantial delays to
   further execution. In contrast, network.target is a passive
   unit (i.e. pulled in by the provider of the functionality,
   rather than the consumer) that usually does not delay
   execution much. Usually, network.target is part of the boot
   of most systems, while network-online.target is not, except
   when at least one unit requires it. Also see Running
   Services After the Network is up[1] for more information.

sshd does not in general require a configured network connection just to
start up, and making it do so would be a pretty significant change to
the unit dependency graph on most systems; it would make "is not [part
of the boot process]" above typically untrue, for one thing.

See also
https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/, which
among other things (in general the tone of that page is that a
well-written service should not use After=network-online.target) says:

  "If you write a server: listen on [::], [::1], 0.0.0.0 and 127.0.0.1
  only.  These pseudo-addresses are unconditionally available."

That's what sshd does in its default configuration.  If it doesn't work,
the systemd documentation suggests that something else is not fulfilling
its end of a contract somewhere.

-- 
Colin Watson (he/him)  [cjwat...@debian.org]



Bug#982950: ssh.service starts sshd before network is online: please switch to After=network-online.target instead of just After=network.target

2021-02-17 Thread Thomas Goirand
Package: openssh-server
Version: 1:8.4p1-4
Severity: grave

Hi there,

On a Sid/Testing system, currently we have in /lib/systemd/system/ssh.service:

After=network.target auditd.service

While this isn't a problem in most installation, it didn't work under our setup,
because we use "bgp-to-the-host" networking. In this setup, we need FRR (the
BGP routing daemon which is a fork of Quagga, if you didn't know) to provide
network connectivity to the server. Our configuration is something like this:

# cat /etc/frr/frr.conf
[...]
!
int lo
 ip address 10.56.17.7/32
!
[...]

This means that, until FRR is fully up and running, with the BGP session
established, the server IP (10.x.x.x/32 bound to the loopback interface) isn't
set yet on the server, and the ssh daemon cannot bind on the IP (as it's not
there yet).

Our fix was pretty simple:

# cat /etc/systemd/system/ssh.service.d/override.conf 
[Unit]
After=network-online.target auditd.service

But IMO, this is very wrong to mandate doing this, and not having ssh
connectivity after a reboot, is kind of a grave problem.

So, could you hard-wire this in the openssh-server package directly, so Debian
users can avoid such an override? Indeed After=network.target doesn't tell you
that network is ready. After=network-online.target does, and that's IMO what
the ssh daemon should be using.

Thanks for maintaining openssh in Debian,
Cheers,

Thomas Goirand (zigo)