On Mon, 22 Aug 2022 at 21:07:57 +0200, Paul Gevers wrote:
> paul@mulciber ~ $ sudo lxc-start test && sudo lxc-attach test -- sh -ec "if
> [ -d /run/systemd/system ]; then echo systemd ; exit 0 ; else echo unknown ;
> exit 0 ; fi" ; echo $? && sudo lxc-stop test
> [sudo] password for paul:
> unknown
> 0
> paul@mulciber ~ $ sudo lxc-start test && sudo lxc-attach test -- sh -ec "if
> [ -d /run/systemd/system ]; then echo systemd ; exit 0 ; else echo unknown ;
> exit 0 ; fi" ; echo $? && sudo lxc-stop test
> systemd
> 0

A theory for what might be going on here: as systemd starts up, it
creates /run/systemd/system *almost* immediately; but if lxc-attach
happens fast enough, then it can finish running the shell command before
systemd has had a chance to create /run/systemd/system. So wait_booted()
would think we're running sysvinit or some other non-systemd init,
and run lib/await-sysv-boot instead of systemctl.

However, that seems unlikely to be the root cause for the original bug you
reported. Even if we mis-detect systemd as sysvinit, lib/await-sysv-boot
is basically doing the same as the old implementation, polling until
runlevel(8) indicates a suitable runlevel, but with the polling loop
happening in the container instead of on the host. And I'd expect that
to mostly work, either on systemd or sysvinit? The old implementation
always just did the equivalent of this anyway, and systemd does have a
working runlevel(8) (its use is discouraged, but it's there).

The one part of lib/await-sysv-boot that is not systemd-friendly is that
it unconditionally waits for /etc/init.d/rc to finish, and does not wait
for network-online.target if systemd was detected.

On podman with systemd, we'd ideally be waiting for boot by using
`NOTIFY_SOCKET=... podman run --notify=container ...` - but that requires
knowing in advance that the container is going to boot with systemd (or
in principle some other init system that implements the sd_notify()
protocol), and in general we don't know that in advance. Also, lxc doesn't
implement that protocol, as far as I'm aware.

>From the original bug report:

> autopkgtest-virt-lxc [21:45:43]: ERROR: Waiting for boot to finish failed
> autopkgtest-virt-lxc [21:45:43]: ERROR:     Failed to connect to bus: No such 
> file or directory

I think that error message is systemd failing to connect to the D-Bus
system bus when we do the systemctl command to wait for boot to finish.
This is a different race: we have successfully identified the container
as running systemd as its init system, but there's no system bus socket
yet, so systemctl fails.

In the follow-up, instead we're hitting a timeout:

>   File "/usr/share/autopkgtest/lib/VirtSubproc.py", line 262, in wait_booted
>     (rc, err, _) = execute_timeout(

This is a 60 second timeout waiting for lib/await-sysv-boot to finish,
and it looks as though lib/await-sysv-boot has not produced any output
on its stdout/stderr (I should probably make it more verbose, since its
output is only shown if it fails).

    smcv

Reply via email to