Bug#964596: cloud.debian.org: Debian 10 EC2: IPv4 address suddenly flushed

Noah Meyerhans Thu, 09 Jul 2020 12:25:04 -0700

On Thu, Jul 09, 2020 at 12:09:44PM +0200, Martin Olsson wrote:
>    Severity: major


JFYI, "major" is not a valid severity.  The complete list of severities
is listed at https://www.debian.org/Bugs/Developer#severities  In the
case of this bug, since the behavior is only triggered by customization
of kernel parameters, we should leave it at the BTS default of "normal".

>    Install a Debian 9 machine using the official Debian 9 AMI.
> 
>    During the hardening of the machine, disable IPv6 completely:
>    # cat /etc/sysctl.d/disable_ipv6.conf
>    net.ipv6.conf.all.disable_ipv6 = 1
>    net.ipv6.conf.default.disable_ipv6 = 1
>    net.ipv6.conf.eth0.disable_ipv6 = 1
>    net.ipv6.conf.lo.disable_ipv6 = 1
> 
>    This hardened Debian 9 server works perfectly for a year.

I think there is more to it.  When I launch a Debian 9 with those sysctl
values set, the network is not fully configured and the instance boots
to systemd's "degraded" status.  Journalctl shows:

Jul 09 18:34:11 ip-10-0-0-149 systemd[1]: Started ifup for eth0.
Jul 09 18:34:11 ip-10-0-0-149 systemd[1]: Starting Raise network interfaces...
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Internet Systems Consortium DHCP 
Client 4.3.5
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Internet Systems Consortium DHCP Client 
4.3.5
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Copyright 2004-2016 Internet 
Systems Consortium.
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Copyright 2004-2016 Internet Systems 
Consortium.
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: All rights reserved.
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: All rights reserved.
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: For info, please visit 
https://www.isc.org/software/dhcp/
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: For info, please visit 
https://www.isc.org/software/dhcp/
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: 
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Listening on 
LPF/eth0/02:e7:21:78:ad:4a
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Listening on LPF/eth0/02:e7:21:78:ad:4a
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Sending on   LPF/eth0/02:e7:21:78:ad:4a
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Sending on   Socket/fallback
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: DHCPREQUEST of 10.0.0.149 on eth0 to 
255.255.255.255 port 67
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Sending on   
LPF/eth0/02:e7:21:78:ad:4a
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Sending on   Socket/fallback
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: DHCPACK of 10.0.0.149 from 10.0.0.1
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: DHCPREQUEST of 10.0.0.149 on eth0 
to 255.255.255.255 port 67
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: DHCPACK of 10.0.0.149 from 10.0.0.1
Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: bound to 10.0.0.149 -- renewal in 
1560 seconds.
Jul 09 18:34:11 ip-10-0-0-149 sh[258]: bound to 10.0.0.149 -- renewal in 1560 
seconds.
Jul 09 18:34:13 ip-10-0-0-149 ifup[364]: ifup: waiting for lock on 
/run/network/ifstate.eth0
Jul 09 18:34:17 ip-10-0-0-149 sh[258]: Could not get a link-local address
Jul 09 18:34:17 ip-10-0-0-149 sh[258]: ifup: failed to bring up eth0
Jul 09 18:34:17 ip-10-0-0-149 systemd[1]: ifup@eth0.service: Main process 
exited, code=exited, status=1/FAILURE
...
Jul 09 18:34:23 ip-10-0-0-149 systemd[1]: Failed to start Raise network 
interfaces.
Jul 09 18:34:23 ip-10-0-0-149 systemd[1]: networking.service: Unit entered 
failed state.
Jul 09 18:34:23 ip-10-0-0-149 systemd[1]: networking.service: Failed with 
result 'exit-code'.

And:
admin@ip-10-0-0-149:~$ systemctl is-system-running 
degraded

So I think that regardless of what happens when the instance is upgraded
to Debian 10, the system is unhealthy even when running Debian 9 when
modified in the way you've described.

There are a couple of ways that you can disable IPv6 without breaking
things.  You could modify /usr/local/sbin/inet6-ifup-helper to exit with
a '0' status unconditionally, or you could avoid running it altogether.
To do that, remove all the lines containing 'inet6' from
/etc/network/interfaces.  This should ensure that the network is fully
configured and that the system recognizes as such.  In my testing, the
upgrade to buster after performing these changes is successful and there
are no residual issues.

I think it's reasonable to add a check in
/usr/local/sbin/inet6-ifup-helper for future revisions of the stretch
AMI to exit successfully if IPv6 is disabled on $IFACE.

>    A reset of the EC2 brings the access back, only to be lost again 1h later.
> 
>    (unfortunately, neither dhclient nor the cloud-init scripts syslogged any
>    error, so it was pretty hard to figure out what was wrong)

Try journalctl

>    It turns out to be the IPv6 hardening that generates problems for
>    dhclient/ifup.
> 
>    I believe the problem lies in /sbin/dhclient-script :
>            if [ -n "$old_ip_address" ] &&
>               [ "$old_ip_address" != "$new_ip_address" ]; then
>                # leased IP has changed => flush it
>                ip -4 addr flush dev ${interface} label ${interface}
>            fi
> 
>    My guess is that when dhclient fails to set an IPv6 IP, the above code
>    flushes the current IPv4 configured on the machine, making it lose all
>    network connectivity.

No, there are actually two separate dhclient processes involved; one
handles IPv4 and the other v6.  In the case you're describing, the IPv6
dhclient is never actually invoked.

>    This makes me think that the cloud-init package for Debian 10 does
>    something wrong.
> 
>    Somewhat related bug: #846583

Cloud-init isn't actually involved in setting up networking in EC2.  In
Debian 9, the configuration is static and built-in to the AMI.  In
Debian 10, the equivalent interface configuration is generated on demand
by udev.

noah

Bug#964596: cloud.debian.org: Debian 10 EC2: IPv4 address suddenly flushed

Reply via email to