Package: ifupdown Version: 0.8.36 Severity: important Tags: patch Hi!
Systemd has a class of boot-time races which can result in deadlock while the ifupdown-pre.service is waiting for udevadm settle - in most of the cases where that occurs ifupdown is an innocent victim of the interactions between other things with poorly specified or insufficient dependency and ordering relationships - but when those get trapped on either side of ifupdown (reasonably enough) waiting for the initial set of network devices to become available, people get locked out of their remote machines after udevadm settle times out, ifupdown-pre 'fails', and then the networking.service is simply not started. It seems there have been many instances, and many permutations, of people effected by this class of systemd race-to-deadlock bugs, they can be intermittent, very hard to get to the bottom of, and in almost all of the reported cases I've found so far, people just gave up trying to diagnose them and masked the ifupdown-pre.service as a workaround. But in almost all of those cases that's the wrong kludge as there was nothing which had actually failed about waiting for the network devices to be available, and nothing which would have subsequently prevented networking.service from successfully starting ... So I'd like to suggest a much better workaround, which should be the default in ifupdown instead, is simply to change: diff --git a/debian/networking.service b/debian/networking.service index 593172b..b645409 100644 --- a/debian/networking.service +++ b/debian/networking.service @@ -2,7 +2,7 @@ Description=Raise network interfaces Documentation=man:interfaces(5) DefaultDependencies=no -Requires=ifupdown-pre.service +Wants=ifupdown-pre.service Wants=network.target After=local-fs.target network-pre.target apparmor.service systemd-sysctl.service systemd-modules-load.service ifupdown-pre.service Before=network.target shutdown.target network-online.target With this, networking.service will still wait for ifupdown-pre to complete, either normally or by systemd's "bug fixing" timeout when other services deadlock around it - and then in either case the networking.service will independently either succeed (in the probable case where networking devices were not part of the race that deadlocked), or fail to bring up only the network devices that were effected by that problem. But it will be *much* less likely for people to get locked out of remote access to fix the real problem when the next dist-upgrade brings some change to the set of unit files on their system which introduces this race in a way their machine will lose (which was how I hit this on Buster to Bullseye upgrades). As a side note to all that, the TimeoutSec=180 in ifupdown-pre is a bit misleading, as udevadm settle will itself time out in 120 seconds unless it is told to do otherwise. Cheers, Ron As a postscript for anyone who might be interested, here is the details of the particular race instance that first bit me and got me digging into this: The BitBabbler package has udev rules and configuration for assigning hardware RNG devices directly to VM instances instead of to the host. It does this with a call to virsh, which in normal use (or prior to Bullseye) will 'immediately' either: - succeed - fail because the desired VM is not active - or fail because libvirtd has not yet started (or is not running) and its communication socket is not present. In no case was that operation ever expected to block for any extended duration, nor does it have any reason to. But in Bullseye, libvirt changed from managing its control socket itself to using a "socket activation" unit, which is created (aiui on the naive advice of systemd advocates) very early in the boot process - long before it would be able to start the service, as the service's own dependencies are not yet satisfied, and those are not applied transitively to the .socket unit which would be requesting the (as yet unstartable) service. So now we have a race where the kernel or a udev cold-plug trigger for an already attached BitBabbler triggers a call to virsh which is racing with the creation of the libvirtd.socket, if the socket unit has not yet created it, that fails (as expected) and everything runs normally, with the device being attached to the VM later when it is eventually started. But if the socket unit has already created a zombie socket, virsh will send its request to it and then wait for a response, which is never going to come because libvirtd being started is trapped on the other side of network.target being reached. And then ifupdown-pre innocently stumbles into this crime scene because calling udevadm settle at this point will in turn block until the call to virsh completes, and even though the network device events have probably been processed normally, probably before this whole chain of events even started, we now have a mexican standoff that has brought the whole show to a halt until systemd pulls its timeout trigger, and everyone loses in the resulting carnage. The problem is fixable, but it requires fixes and mitigations in many different places (at least while the systemd folk continue to insist that "starting sockets as early as possible magically resolves all dependencies" and don't make the dependencies of the service units that sockets ultimately want to start be automatically transitive). As long as there are zombie sockets things can block on, these sort of circular races will always continue to exist. No amount of "deprecating" the use of udevadm settle, or other workarounds for deadlocking will actually change that, they just sweep the problem under a different rug that someone will eventually lift again. ifupdown can make itself more resilient to this by using Wants to wait for ifupdown-pre, but not failing to even try to start in this case as it does when Requires is used. I've tried to narrow the window for this race by testing earlier (in the BitBabbler udev rule) for the presence of the libvirtd control socket instead of waiting until virsh gets to the point where it does. That alone can't fix this problem, but it makes it harder and rarer to lose on the slower machines where this was first seen. And the next bug I'll file is for libvirt to defer the creation of its .socket until the daemon's dependencies can be met so that the time which this could block will become finite instead of an indefinite deadlock - which will (aside from *very* slow machines still timing out, which will always be a problem as long as systemd relies on timeouts to resolve design and implementation bugs) actually fix this for this particular permutation of participants ... But until we've found and worked through all the possible permutations of things that can create this situation, having ifupdown assume that a timeout failure of ifupdown-pre is unlikely to mean networking.service will also fail after that 2 minute delay, will give people the best chance of still being able to access effected machines until it can be traced and debugged in their particular case. I have tested the patch above, prior to taking further actions to prevent the race entirely, and after waiting for the timeout to fire network does come up normally on all the machines I've had subjected to this problem after a Bullseye upgrade.