On Thu, Jul 09, 2020 at 12:09:44PM +0200, Martin Olsson wrote: > Severity: major
JFYI, "major" is not a valid severity. The complete list of severities is listed at https://www.debian.org/Bugs/Developer#severities In the case of this bug, since the behavior is only triggered by customization of kernel parameters, we should leave it at the BTS default of "normal". > Install a Debian 9 machine using the official Debian 9 AMI. > > During the hardening of the machine, disable IPv6 completely: > # cat /etc/sysctl.d/disable_ipv6.conf > net.ipv6.conf.all.disable_ipv6 = 1 > net.ipv6.conf.default.disable_ipv6 = 1 > net.ipv6.conf.eth0.disable_ipv6 = 1 > net.ipv6.conf.lo.disable_ipv6 = 1 > > This hardened Debian 9 server works perfectly for a year. I think there is more to it. When I launch a Debian 9 with those sysctl values set, the network is not fully configured and the instance boots to systemd's "degraded" status. Journalctl shows: Jul 09 18:34:11 ip-10-0-0-149 systemd[1]: Started ifup for eth0. Jul 09 18:34:11 ip-10-0-0-149 systemd[1]: Starting Raise network interfaces... Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Internet Systems Consortium DHCP Client 4.3.5 Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Internet Systems Consortium DHCP Client 4.3.5 Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Copyright 2004-2016 Internet Systems Consortium. Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Copyright 2004-2016 Internet Systems Consortium. Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: All rights reserved. Jul 09 18:34:11 ip-10-0-0-149 sh[258]: All rights reserved. Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: For info, please visit https://www.isc.org/software/dhcp/ Jul 09 18:34:11 ip-10-0-0-149 sh[258]: For info, please visit https://www.isc.org/software/dhcp/ Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Listening on LPF/eth0/02:e7:21:78:ad:4a Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Listening on LPF/eth0/02:e7:21:78:ad:4a Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Sending on LPF/eth0/02:e7:21:78:ad:4a Jul 09 18:34:11 ip-10-0-0-149 sh[258]: Sending on Socket/fallback Jul 09 18:34:11 ip-10-0-0-149 sh[258]: DHCPREQUEST of 10.0.0.149 on eth0 to 255.255.255.255 port 67 Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Sending on LPF/eth0/02:e7:21:78:ad:4a Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: Sending on Socket/fallback Jul 09 18:34:11 ip-10-0-0-149 sh[258]: DHCPACK of 10.0.0.149 from 10.0.0.1 Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: DHCPREQUEST of 10.0.0.149 on eth0 to 255.255.255.255 port 67 Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: DHCPACK of 10.0.0.149 from 10.0.0.1 Jul 09 18:34:11 ip-10-0-0-149 dhclient[269]: bound to 10.0.0.149 -- renewal in 1560 seconds. Jul 09 18:34:11 ip-10-0-0-149 sh[258]: bound to 10.0.0.149 -- renewal in 1560 seconds. Jul 09 18:34:13 ip-10-0-0-149 ifup[364]: ifup: waiting for lock on /run/network/ifstate.eth0 Jul 09 18:34:17 ip-10-0-0-149 sh[258]: Could not get a link-local address Jul 09 18:34:17 ip-10-0-0-149 sh[258]: ifup: failed to bring up eth0 Jul 09 18:34:17 ip-10-0-0-149 systemd[1]: ifup@eth0.service: Main process exited, code=exited, status=1/FAILURE ... Jul 09 18:34:23 ip-10-0-0-149 systemd[1]: Failed to start Raise network interfaces. Jul 09 18:34:23 ip-10-0-0-149 systemd[1]: networking.service: Unit entered failed state. Jul 09 18:34:23 ip-10-0-0-149 systemd[1]: networking.service: Failed with result 'exit-code'. And: admin@ip-10-0-0-149:~$ systemctl is-system-running degraded So I think that regardless of what happens when the instance is upgraded to Debian 10, the system is unhealthy even when running Debian 9 when modified in the way you've described. There are a couple of ways that you can disable IPv6 without breaking things. You could modify /usr/local/sbin/inet6-ifup-helper to exit with a '0' status unconditionally, or you could avoid running it altogether. To do that, remove all the lines containing 'inet6' from /etc/network/interfaces. This should ensure that the network is fully configured and that the system recognizes as such. In my testing, the upgrade to buster after performing these changes is successful and there are no residual issues. I think it's reasonable to add a check in /usr/local/sbin/inet6-ifup-helper for future revisions of the stretch AMI to exit successfully if IPv6 is disabled on $IFACE. > A reset of the EC2 brings the access back, only to be lost again 1h later. > > (unfortunately, neither dhclient nor the cloud-init scripts syslogged any > error, so it was pretty hard to figure out what was wrong) Try journalctl > It turns out to be the IPv6 hardening that generates problems for > dhclient/ifup. > > I believe the problem lies in /sbin/dhclient-script : > if [ -n "$old_ip_address" ] && > [ "$old_ip_address" != "$new_ip_address" ]; then > # leased IP has changed => flush it > ip -4 addr flush dev ${interface} label ${interface} > fi > > My guess is that when dhclient fails to set an IPv6 IP, the above code > flushes the current IPv4 configured on the machine, making it lose all > network connectivity. No, there are actually two separate dhclient processes involved; one handles IPv4 and the other v6. In the case you're describing, the IPv6 dhclient is never actually invoked. > This makes me think that the cloud-init package for Debian 10 does > something wrong. > > Somewhat related bug: #846583 Cloud-init isn't actually involved in setting up networking in EC2. In Debian 9, the configuration is static and built-in to the AMI. In Debian 10, the equivalent interface configuration is generated on demand by udev. noah