Thanks for the explanation, Dan!  I was off down a wrong path, I
appreciate the correction.

I've just downloaded the Azure image from cloud-images.u.c and it
includes this in `/etc/netplan/90-hotplug-azure.yaml`:

# This netplan yaml is delivered in Azure cloud images to support
# attaching and detaching nics after the instance first boot.
# Cloud-init otherwise handles initial boot network configuration in
# /etc/netplan/50-cloud-init.yaml
network:
    version: 2
    ethernets:
        ephemeral:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: '!eth0'
            optional: true
        hotpluggedeth0:
            dhcp4: true
            match:
                driver: hv_netvsc
                name: 'eth0'

This file is not present in a booted system, because cloud-init removes
it during boot:

2020-11-09 18:12:09,306 - handlers.py[DEBUG]: start: 
azure-ds/maybe_remove_ubuntu_network_config_scripts: 
maybe_remove_ubuntu_network_config_scripts
2020-11-09 18:12:09,307 - DataSourceAzure.py[INFO]: Removing Ubuntu extended 
network scripts because cloud-init updates Azure network configuration on the 
following event: System boot.
2020-11-09 18:12:09,307 - util.py[DEBUG]: Attempting to remove 
/etc/netplan/90-hotplug-azure.yaml
2020-11-09 18:12:09,307 - handlers.py[DEBUG]: finish: 
azure-ds/maybe_remove_ubuntu_network_config_scripts: SUCCESS: 
maybe_remove_ubuntu_network_config_scripts

It does this before the regular cloud-init network configuration is
written, or `netplan generate` is called:

2020-11-09 18:12:09,465 - util.py[DEBUG]: Writing to 
/etc/netplan/50-cloud-init.yaml - wb: [644] 603 bytes
2020-11-09 18:12:09,466 - subp.py[DEBUG]: Running command ['netplan', 
'generate'] with allowed return codes [0] (shell=False, capture=True)

cloud-init also runs a couple of udevadm commands right after `netplan
generate`:

2020-11-09 18:12:09,813 - subp.py[DEBUG]: Running command ['udevadm', 
'test-builtin', 'net_setup_link', '/sys/class/net/eth0'] with allowed return 
codes [0] (shell=False, capture=True)
2020-11-09 18:12:09,828 - subp.py[DEBUG]: Running command ['udevadm', 
'test-builtin', 'net_setup_link', '/sys/class/net/lo'] with allowed return 
codes [0] (shell=False, capture=True)

This all happens before systemd-networkd starts:

Nov 09 18:12:09.956027 focal-1604945439 systemd[1]: Starting Network
Service...

So: I'm not really sure what's going on here.  I've tried restoring `90
-hotplug-azure.yaml` and removing `50-cloud-init.yaml`; that doesn't
cause the issue to reproduce on a subsequent boot.

One thing worth noting, that could lead to unexpected state: cloud-init
performs a DHCP on this interface (in order to be able to fetch the
network configuration it is going to apply).  It does this in a sandbox
(i.e. it doesn't use system configuration for it), but potentially that
could mean that there's (kernel?) state for that interface which
{udev,network}d interpret in a way that leads to this issue?

** Changed in: cloud-init (Ubuntu)
       Status: Incomplete => New

** Changed in: systemd (Ubuntu)
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Touch seeded packages, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1902960

Title:
  Upgrade from 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to break DNS
  resolution in some cases

Status in cloud-init package in Ubuntu:
  New
Status in systemd package in Ubuntu:
  Incomplete

Bug description:
  The systemd upgrade 245.4-4ubuntu3.3 to 245.4-4ubuntu3.2 appears to
  have broken DNS resolution across much of our Azure fleet earlier
  today.  We ended up mitigating this by forcing reboots on the
  associated instances, no combination of networkctl reload,
  reconfigure, systemctl daemon-reexec, systemctl daemon-reload, netplan
  generate, netplan apply would get resolvectl to have a DNS server
  again.  The main symptom appears to have been systemd-networkd
  believing it wasn't managing the eth0 interfaces:

  ubuntu@machine-1:~$ sudo networkctl
  IDX LINK TYPE     OPERATIONAL SETUP
    1 lo   loopback carrier     unmanaged
    2 eth0 ether    routable    unmanaged
                                                                            2 
links listed.

  Which eventually made them lose their DNS resolvers:

  ubuntu@machine-1:~$ sudo resolvectl dns
  Global:
  Link 2 (eth0):

  After rebooting, we see this behaving properly:

  ubuntu@machine-1:~$ sudo networkctl list
  IDX LINK TYPE     OPERATIONAL SETUP
    1 lo   loopback carrier     unmanaged
    2 eth0 ether    routable    configured

  2 links listed.
  ubuntu@machine-1:~$ sudo resolvectl dns
  Global:
  Link 2 (eth0): 168.63.129.16

  This appears to be specifically linked to the upgrade, i.e. we were able to 
provoke the issue by upgrading the systemd package, so I suspect it's part of 
the packaging in the upgrade process.
  ---
  ProblemType: Bug
  ApportVersion: 2.20.11-0ubuntu27.10
  Architecture: amd64
  CasperMD5CheckResult: skip
  DistroRelease: Ubuntu 20.04
  Lspci-vt:
   -[0000:00]-+-00.0  Intel Corporation 440BX/ZX/DX - 82443BX/ZX/DX Host bridge 
(AGP disabled)
              +-07.0  Intel Corporation 82371AB/EB/MB PIIX4 ISA
              +-07.1  Intel Corporation 82371AB/EB/MB PIIX4 IDE
              +-07.3  Intel Corporation 82371AB/EB/MB PIIX4 ACPI
              \-08.0  Microsoft Corporation Hyper-V virtual VGA
  Lsusb: Error: command ['lsusb'] failed with exit code 1:
  Lsusb-t:

  Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
  MachineType: Microsoft Corporation Virtual Machine
  Package: systemd 245.4-4ubuntu3.3
  PackageArchitecture: amd64
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
   LANG=C.UTF-8
   SHELL=/bin/bash
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.4.0-1031-azure 
root=PARTUUID=2e08bba3-68b4-4a16-af3b-47b73bd138a9 ro console=tty1 
console=ttyS0 earlyprintk=ttyS0 panic=-1
  ProcVersionSignature: Ubuntu 5.4.0-1031.32-azure 5.4.65
  Tags:  focal uec-images
  Uname: Linux 5.4.0-1031-azure x86_64
  UpgradeStatus: No upgrade log present (probably fresh install)
  UserGroups: N/A
  _MarkForUpload: True
  dmi.bios.date: 12/07/2018
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 090008
  dmi.board.name: Virtual Machine
  dmi.board.vendor: Microsoft Corporation
  dmi.board.version: 7.0
  dmi.chassis.asset.tag: 7783-7084-3265-9085-8269-3286-77
  dmi.chassis.type: 3
  dmi.chassis.vendor: Microsoft Corporation
  dmi.chassis.version: 7.0
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr090008:bd12/07/2018:svnMicrosoftCorporation:pnVirtualMachine:pvr7.0:rvnMicrosoftCorporation:rnVirtualMachine:rvr7.0:cvnMicrosoftCorporation:ct3:cvr7.0:
  dmi.product.name: Virtual Machine
  dmi.product.uuid: 4412ad79-83fa-f845-b7c2-6f30dd4f1950
  dmi.product.version: 7.0
  dmi.sys.vendor: Microsoft Corporation

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1902960/+subscriptions

-- 
Mailing list: https://launchpad.net/~touch-packages
Post to     : touch-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~touch-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to