Package: cloud.debian.org Severity: serious
After spawning a VM, it takes a long time to get networking (output from the console): cloud-init[281]: Cloud-init v. 20.2 running 'init-local' at Wed, 28 Jul 2021 07:49:23 +0000. Up 2.98 seconds. Started [0;1;39mInitial cloud-init job (pre-networking). Reached target [0;1;39mNetwork (Pre). Starting [0;1;39mRaise network interfaces... A start job is running for Raise network interfaces (6s / 5min 1s) A start job is running for Raise network interfaces (7s / 5min 1s) A start job is running for Raise network interfaces (7s / 5min 1s) [...] A start job is running for Raise ne���ork interfaces (5min 1s / 5min 1s) Failed to start Raise network interfaces. A systemctl status networking.service shows: Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled) Active: failed (Result: timeout) since Wed 2021-07-28 07:54:23 UTC; 52min ago This is specific to the Debian image. We've compared with Ubuntu 21.04. Ubuntu: - Initial boot: 2021-07-28T11:58:50.836457+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6 2021-07-28T11:58:50.836724+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREPLY(tap67fa8c3f-8d) <redacted>::3ba 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6 host-<redacted>--3ba - Server side: /var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:63:54:8c,tag:dhcpv6,host-<redacted>--3ba.dc3-a.pub1.infomaniak.cloud.,[<redacted>::3ba] /var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627559930 3042863103 <redacted>::3ba /host-<redacted>--3ba 00:02:00:00:ab:11:11:16:f0:97:0e:c5:c9:b6 Then we do "openstack server rebuild" and get the same result. Debian: - Intial boot: 2021-07-28T11:59:15.838131+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da 2021-07-28T11:59:15.838369+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPADVERTISE(tap67fa8c3f-8d) <redacted>::143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da host-<redacted>--143 2021-07-28T11:59:16.795826+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREQUEST(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da 2021-07-28T11:59:16.796177+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPREPLY(tap67fa8c3f-8d) <redacted>::143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da host-<redacted>--143 - Server side: /var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:f1:a9:da,tag:dhcpv6,host-<redacted>--143.dc3-a.pub1.infomaniak.cloud.,[<redacted>::143] /var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627481056 1056025050 <redacted>::143 host-2001-1600-10-100--143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da Then, I do the same "openstack server rebuild" and get: - Initial boot: 2021-07-28T12:26:38.804683+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da 2021-07-28T12:26:38.805023+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPADVERTISE(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da no addresses available - Server side: /var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/host:fa:16:3e:f1:a9:da,tag:dhcpv6,host-<redacted>--143.dc3-a.pub1.infomaniak.cloud.,[<redacted>::143] /var/lib/neutron/dhcp/dcf25c41-9057-4bc2-8475-a2e3c5d8c662/leases:1627481056 1056025050 2001:1600:10:100::143 host-<redacted>--143 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da We see here that DHCPv6 fails because the DUID sent by the distro isn't the same as the initial build of the VM: 2021-07-28T11:59:15.838131+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:03:11:fa:16:3e:f1:a9:da 2021-07-28T12:26:38.804683+00:00 pub1-network-3 dnsmasq-dhcp[3765807]: DHCPSOLICIT(tap67fa8c3f-8d) 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da The server kept the initial DHCPv6 lease of the first DUID, so it refuses the request with a new one. We see on the startup logs that the image creates a new DUID: Jul 28 11:59:17 debianv6 sh[376]: Created duid "\000\001\000\001(\224\003\021\372\026>\361\251\332". ... Jul 28 12:33:06 debianv6 sh[376]: Created duid "\000\001\000\001(\224\011{\372\026>\361\251\332". Note that to convert it, we can do: $ printf "\000\001\000\001(\224\011{\372\026>\361\251\332" | hexdump -e '14/1 "%02x " "\n"' | sed 's/ /:/g' 00:01:00:01:28:94:09:7b:fa:16:3e:f1:a9:da So, to fix this problem, we need to fix the Debian image. Reading the dhclient doc, we can see an interesting option: -D LL or LLT Override the default when selecting the type of DUID to use. By default, DHCPv6 dhclient creates an identifier based on the link-layer address (DUID-LL) if it is running in stateless mode (with -S, not requesting an address), or it creates an identifier based on the link-layer address plus a timestamp (DUID-LLT) if it is running in stateful mode (without -S, requesting an address). When DHCPv4 is configured to use a DUID using -i option the default is to use a DUID-LLT. -D overrides these default, with a value of either LL or LLT. So, it looks like the Debian image is using the local link MAC address plus a timestamp, which is the thing that seems to be problematic here. We need to make it use the local link MAC address only, so that a "server rebuild" results in a VM with IPv6 connectivity. Note that Ubuntu isn't using dhclient, which is probably why it's not affected. Cloud init populates /etc/network/interfaces.d/50-cloud-init this way: root@zigo-test-server:/etc# cat /etc/network/interfaces.d/50-cloud-init # This file is generated from information provided by the datasource. Changes # to it will not persist across an instance reboot. To disable cloud-init's # network configuration capabilities, write a file # /etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following: # network: {config: disabled} auto lo iface lo inet loopback dns-nameservers 83.166.143.51 83.166.143.52 2001:1600:0:aaaa::53:5 2001:1600:0:aaaa::53:6 auto ens3 iface ens3 inet dhcp accept_ra 1 mtu 1500 # control-alias ens3 iface ens3 inet6 dhcp post-up route add -A inet6 default gw 2001:1600:10:100::1 || true pre-down route del -A inet6 default gw 2001:1600:10:100::1 || true In ifupdown, in In https://bugs.debian.org/799257 someone suggested to use the -D LLT option, then it went away in version 0.8.2 because of https://bugs.debian.org/806964 So here, we probably need to get ifupdown to use the -D LL option explicitely, but I'm not sure how to do this... Does ifupdown even has an option for forcing that? It doesn't seem to be the case. :/ Any help or comment would be welcome. Cheers, Thomas Goirand (zigo)