Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
On Thu, Jan 12, 2023 at 03:31:25PM +, Lewis Gaul wrote: > Could you suggest commands to run to do this? # systemd-analyze set-log-level debug # logger MARK-BEGIN # ...whatever restart commands # ...wait for the failure # logger MARK-END # systemd-analyze set-log-level info # journalctl -b | sed -n '/MARK-BEGIN/,/MARK-END/p' > Should we be suspicious of the host systemd version and/or the fact that > the host is in 'legacy' mode while the container (based on the systemd > version being higher) is in 'hybrid' mode? Maybe we should try telling the > container systemd to run in 'legacy' mode somehow? I'd be wary of the legacy@host and {hybrid,unified}@container combo. Also the old versions on the host could mean that the cgroup setup may be buggy. (I only have capacity to look into the recent code but the debug logs above may show something obvious.) Ideally, you should tell both host and container to run in the unified mode ;-) Michal signature.asc Description: Digital signature
Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
Another data point: I can reproduce on Ubuntu 18.04 host which has systemd v237 in *hybrid* cgroup mode (assuming I've understood the definition of hybrid, as per my previous email). So it's looking like it might be an issue with interoperation between host and container systemd, introduced somewhere between v239 and v245 for host systemd when the container is running v245 (also seen with v244 and v249). Thanks, Lewis On Thu, 12 Jan 2023 at 15:31, Lewis Gaul wrote: > Hey Michal, > > Thanks for the reply. > > > I'd suggest looking at debug level logs from the hosts systemd around > the time of the container restart. > > Could you suggest commands to run to do this? > > > What is the host's systemd version and cgroup mode > (legacy,hybrid,unified)? (I'm not sure what the distros in your original > message referred to.) > > The issue has been seen on Centos 8.2 and 8.4 host distro, but not seen on > Ubuntu 20.04. The former has systemd v239 and appears to be in 'legacy' > cgroup mode (no /sys/fs/cgroup/unified cgroup2 mount), whereas the latter > has systemd v245 and is in what I believe you'd refer to as 'hybrid' mode > (with the /sys/fs/cgroup/unified cgroup2 mount). > > Should we be suspicious of the host systemd version and/or the fact that > the host is in 'legacy' mode while the container (based on the systemd > version being higher) is in 'hybrid' mode? Maybe we should try telling the > container systemd to run in 'legacy' mode somehow? > > Thanks, > Lewis > > On Thu, 12 Jan 2023 at 13:12, Michal Koutný wrote: > >> Hello. >> >> On Tue, Jan 10, 2023 at 03:28:04PM +, Lewis Gaul < >> lewis.g...@gmail.com> wrote: >> > I can confirm that the container has permissions since executing a >> 'mkdir' >> > in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside >> the >> > container succeeds after the restart, so I have no idea why systemd is >> not >> > creating the 'init.scope/' dir. >> >> It looks like it could also be a race/deferred impact from host's systemd. >> >> > I notice that inside the container's systemd cgroup mount >> > 'system.slice/' does exist, but 'user.slice/' also does not (both >> > exist on normal boot). Is there any way I can find systemd logs that >> > might indicate why the cgroup dir creation is failing? >> >> I'd suggest looking at debug level logs from the hosts systemd around >> the time of the container restart. >> >> >> > I could raise this with the podman team, but it seems more in the >> systemd >> > area given it's a systemd warning and I would expect systemd to be >> creating >> > this cgroup dir? >> >> What is the host's systemd version and cgroup mode >> (legacy,hybrid,unified)? (I'm not sure what the distros in your original >> message referred to.) >> >> >> Thanks, >> Michal >> >
Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
Hey Michal, Thanks for the reply. > I'd suggest looking at debug level logs from the hosts systemd around the time of the container restart. Could you suggest commands to run to do this? > What is the host's systemd version and cgroup mode (legacy,hybrid,unified)? (I'm not sure what the distros in your original message referred to.) The issue has been seen on Centos 8.2 and 8.4 host distro, but not seen on Ubuntu 20.04. The former has systemd v239 and appears to be in 'legacy' cgroup mode (no /sys/fs/cgroup/unified cgroup2 mount), whereas the latter has systemd v245 and is in what I believe you'd refer to as 'hybrid' mode (with the /sys/fs/cgroup/unified cgroup2 mount). Should we be suspicious of the host systemd version and/or the fact that the host is in 'legacy' mode while the container (based on the systemd version being higher) is in 'hybrid' mode? Maybe we should try telling the container systemd to run in 'legacy' mode somehow? Thanks, Lewis On Thu, 12 Jan 2023 at 13:12, Michal Koutný wrote: > Hello. > > On Tue, Jan 10, 2023 at 03:28:04PM +, Lewis Gaul > wrote: > > I can confirm that the container has permissions since executing a > 'mkdir' > > in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the > > container succeeds after the restart, so I have no idea why systemd is > not > > creating the 'init.scope/' dir. > > It looks like it could also be a race/deferred impact from host's systemd. > > > I notice that inside the container's systemd cgroup mount > > 'system.slice/' does exist, but 'user.slice/' also does not (both > > exist on normal boot). Is there any way I can find systemd logs that > > might indicate why the cgroup dir creation is failing? > > I'd suggest looking at debug level logs from the hosts systemd around > the time of the container restart. > > > > I could raise this with the podman team, but it seems more in the systemd > > area given it's a systemd warning and I would expect systemd to be > creating > > this cgroup dir? > > What is the host's systemd version and cgroup mode > (legacy,hybrid,unified)? (I'm not sure what the distros in your original > message referred to.) > > > Thanks, > Michal >
Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
Hello. On Tue, Jan 10, 2023 at 03:28:04PM +, Lewis Gaul wrote: > I can confirm that the container has permissions since executing a 'mkdir' > in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the > container succeeds after the restart, so I have no idea why systemd is not > creating the 'init.scope/' dir. It looks like it could also be a race/deferred impact from host's systemd. > I notice that inside the container's systemd cgroup mount > 'system.slice/' does exist, but 'user.slice/' also does not (both > exist on normal boot). Is there any way I can find systemd logs that > might indicate why the cgroup dir creation is failing? I'd suggest looking at debug level logs from the hosts systemd around the time of the container restart. > I could raise this with the podman team, but it seems more in the systemd > area given it's a systemd warning and I would expect systemd to be creating > this cgroup dir? What is the host's systemd version and cgroup mode (legacy,hybrid,unified)? (I'm not sure what the distros in your original message referred to.) Thanks, Michal signature.asc Description: Digital signature
Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
I omitted one piece of information about running with --cgroupns=private thinking it was unrelated, but actually it appears maybe it is related (and perhaps highlights a variant of the issue that is seen on first-boot, not only on container restart). Again (and what makes me think it's related), I can reproduce this on a Centos host but not on Ubuntu (still with SELinux in 'permissive' mode). [root@localhost ~]# podman run -it --name ubuntu --privileged --cgroupns private ubuntu-systemd systemd 245.4-4ubuntu3.19 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTI) Detected virtualization podman. Detected architecture x86-64. Welcome to Ubuntu 20.04.5 LTS! Set hostname to . *Couldn't move remaining userspace processes, ignoring: Input/output errorFailed to create compat systemd cgroup /system.slice: No such file or directoryFailed to create compat systemd cgroup /system.slice/system-getty.slice: No such file or directory* [ OK ] Created slice system-getty.slice. *Failed to create compat systemd cgroup /system.slice/system-modprobe.slice: No such file or directory*[ OK ] Created slice system-modprobe.slice. *Failed to create compat systemd cgroup /user.slice: No such file or directory*[ OK ] Created slice User and Session Slice. [ OK ] Started Dispatch Password Requests to Console Directory Watch. [ OK ] Started Forward Password Requests to Wall Directory Watch. This first warning is coming from one of the same areas of code I linked in my first email: https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2967. I see the same thing with '--cap-add sys_admin' instead of '--privileged', and again seen with both docker and podman. Thanks, Lewis On Tue, 10 Jan 2023 at 15:28, Lewis Gaul wrote: > I'm aware of the higher level of collaboration between podman and systemd > compared to docker, hence primarily raising this issue from a podman angle. > > In privileged mode all mounts are read-write, so yes the container has > write access to the cgroup filesystem. (Podman also ensures write access to > the systemd cgroup subsystem mount in non-privileged mode by default). > > On first boot PID 1 can be found in > /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/init.scope/cgroup.procs, > whereas when the container restarts the 'init.scope/' directory does not > exist and PID 1 is instead found in the parent (container root) cgroup > /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/cgroup.procs > (also reflected by /proc/1/cgroup). This is strange because systemd must be > the one to create this cgroup dir in the initial boot, so I'm not sure why > it wouldn't on subsequent boot? > > I can confirm that the container has permissions since executing a 'mkdir' > in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the > container succeeds after the restart, so I have no idea why systemd is not > creating the 'init.scope/' dir. I notice that inside the container's > systemd cgroup mount 'system.slice/' does exist, but 'user.slice/' also > does not (both exist on normal boot). Is there any way I can find systemd > logs that might indicate why the cgroup dir creation is failing? > > One final datapoint: the same is seen when using a private cgroup > namespace (via 'podman run --cgroupns=private'), although then the error is > then, as expected, "Failed to attach 1 to compat systemd cgroup > /init.scope: No such file or directory". > > I could raise this with the podman team, but it seems more in the systemd > area given it's a systemd warning and I would expect systemd to be creating > this cgroup dir? > > Thanks, > Lewis > > On Tue, 10 Jan 2023 at 14:48, Lennart Poettering > wrote: > >> On Di, 10.01.23 13:18, Lewis Gaul (lewis.g...@gmail.com) wrote: >> >> > Following 'setenforce 0' I still see the same issue (I was also >> suspecting >> > SELinux!). >> > >> > A few additional data points: >> > - this was not seen when using systemd v230 inside the container >> > - this is also seen on CentOS 8.4 >> > - this is seen under docker even if the container's cgroup driver is >> > changed from 'cgroupfs' to 'systemd' >> >> docker is garbage. They are hostile towards running systemd inside >> containers. >> >> podman upstream is a lot friendly, and apparently what everyone in OCI >> is going towards these days. >> >> I have not much experience with podman though, and in particular not >> old versions. Next step would probably be to look at what precisely >> causes the permission issue, via strace. >> >> but did you make sure your container actually gets write access to the >> cgroup trees? >> >> anyway, i'd recommend asking the podman community for help about this. >> >> Lennart >> >> -- >> Lennart Poettering, Berlin >> >
Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
I'm aware of the higher level of collaboration between podman and systemd compared to docker, hence primarily raising this issue from a podman angle. In privileged mode all mounts are read-write, so yes the container has write access to the cgroup filesystem. (Podman also ensures write access to the systemd cgroup subsystem mount in non-privileged mode by default). On first boot PID 1 can be found in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/init.scope/cgroup.procs, whereas when the container restarts the 'init.scope/' directory does not exist and PID 1 is instead found in the parent (container root) cgroup /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/cgroup.procs (also reflected by /proc/1/cgroup). This is strange because systemd must be the one to create this cgroup dir in the initial boot, so I'm not sure why it wouldn't on subsequent boot? I can confirm that the container has permissions since executing a 'mkdir' in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the container succeeds after the restart, so I have no idea why systemd is not creating the 'init.scope/' dir. I notice that inside the container's systemd cgroup mount 'system.slice/' does exist, but 'user.slice/' also does not (both exist on normal boot). Is there any way I can find systemd logs that might indicate why the cgroup dir creation is failing? One final datapoint: the same is seen when using a private cgroup namespace (via 'podman run --cgroupns=private'), although then the error is then, as expected, "Failed to attach 1 to compat systemd cgroup /init.scope: No such file or directory". I could raise this with the podman team, but it seems more in the systemd area given it's a systemd warning and I would expect systemd to be creating this cgroup dir? Thanks, Lewis On Tue, 10 Jan 2023 at 14:48, Lennart Poettering wrote: > On Di, 10.01.23 13:18, Lewis Gaul (lewis.g...@gmail.com) wrote: > > > Following 'setenforce 0' I still see the same issue (I was also > suspecting > > SELinux!). > > > > A few additional data points: > > - this was not seen when using systemd v230 inside the container > > - this is also seen on CentOS 8.4 > > - this is seen under docker even if the container's cgroup driver is > > changed from 'cgroupfs' to 'systemd' > > docker is garbage. They are hostile towards running systemd inside > containers. > > podman upstream is a lot friendly, and apparently what everyone in OCI > is going towards these days. > > I have not much experience with podman though, and in particular not > old versions. Next step would probably be to look at what precisely > causes the permission issue, via strace. > > but did you make sure your container actually gets write access to the > cgroup trees? > > anyway, i'd recommend asking the podman community for help about this. > > Lennart > > -- > Lennart Poettering, Berlin >
Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
On Di, 10.01.23 13:18, Lewis Gaul (lewis.g...@gmail.com) wrote: > Following 'setenforce 0' I still see the same issue (I was also suspecting > SELinux!). > > A few additional data points: > - this was not seen when using systemd v230 inside the container > - this is also seen on CentOS 8.4 > - this is seen under docker even if the container's cgroup driver is > changed from 'cgroupfs' to 'systemd' docker is garbage. They are hostile towards running systemd inside containers. podman upstream is a lot friendly, and apparently what everyone in OCI is going towards these days. I have not much experience with podman though, and in particular not old versions. Next step would probably be to look at what precisely causes the permission issue, via strace. but did you make sure your container actually gets write access to the cgroup trees? anyway, i'd recommend asking the podman community for help about this. Lennart -- Lennart Poettering, Berlin
Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
Following 'setenforce 0' I still see the same issue (I was also suspecting SELinux!). A few additional data points: - this was not seen when using systemd v230 inside the container - this is also seen on CentOS 8.4 - this is seen under docker even if the container's cgroup driver is changed from 'cgroupfs' to 'systemd' Thanks, Lewis On Tue, 10 Jan 2023 at 11:12, Lennart Poettering wrote: > On Mo, 09.01.23 19:45, Lewis Gaul (lewis.g...@gmail.com) wrote: > > > Hi all, > > > > I've come across an issue when restarting a systemd container, which I'm > > seeing on a CentOS 8.2 VM but not able to reproduce on an Ubuntu 20.04 VM > > (both cgroups v1). > > selinux? > > Lennart > > -- > Lennart Poettering, Berlin >
Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
On Mo, 09.01.23 19:45, Lewis Gaul (lewis.g...@gmail.com) wrote: > Hi all, > > I've come across an issue when restarting a systemd container, which I'm > seeing on a CentOS 8.2 VM but not able to reproduce on an Ubuntu 20.04 VM > (both cgroups v1). selinux? Lennart -- Lennart Poettering, Berlin
[systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup
Hi all, I've come across an issue when restarting a systemd container, which I'm seeing on a CentOS 8.2 VM but not able to reproduce on an Ubuntu 20.04 VM (both cgroups v1). The failure looks as follows, hitting the warning condition at https://github.com/systemd/systemd/blob/v245/src/shared/cgroup-setup.c#L279: [root@localhost ubuntu-systemd]# podman run -it --privileged --name ubuntu --detach ubuntu-systemd 5e4ab2a36681c092f4ef937cf03b25a8d3d7b2fa530559bf4dac4079c84d0313 [root@localhost ubuntu-systemd]# podman restart ubuntu 5e4ab2a36681c092f4ef937cf03b25a8d3d7b2fa530559bf4dac4079c84d0313 [root@localhost ubuntu-systemd]# podman logs ubuntu | grep -B6 -A2 'Set hostname' systemd 245.4-4ubuntu3.19 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid) Detected virtualization podman. Detected architecture x86-64. Welcome to Ubuntu 20.04.5 LTS! Set hostname to <5e4ab2a36681>. [ OK ] Created slice system-getty.slice. [ OK ] Created slice system-modprobe.slice. -- systemd 245.4-4ubuntu3.19 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid) Detected virtualization podman. Detected architecture x86-64. Welcome to Ubuntu 20.04.5 LTS! Set hostname to <5e4ab2a36681>. *Failed to attach 1 to compat systemd cgroup /machine.slice/libpod-5e4ab2a36681c092f4ef937cf03b25a8d3d7b2fa530559bf4dac4079c84d0313.scope/init.scope: No such file or directory*[ OK ] Created slice system-getty.slice. If using docker instead of podman (still on CentOS 8.2) the container actually exits after restart (when hitting the code at https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2972): [root@localhost ubuntu-systemd]# docker logs ubuntu | grep -C5 'Set hostname' Detected virtualization docker. Detected architecture x86-64. Welcome to Ubuntu 20.04.5 LTS! Set hostname to <523caa1f03e9>. [ OK ] Created slice system-getty.slice. [ OK ] Created slice system-modprobe.slice. [ OK ] Created slice User and Session Slice. [ OK ] Started Dispatch Password Requests to Console Directory Watch. [ OK ] Started Forward Password Requests to Wall Directory Watch. -- Detected virtualization docker. Detected architecture x86-64. Welcome to Ubuntu 20.04.5 LTS! Set hostname to <523caa1f03e9>. *Failed to attach 1 to compat systemd cgroup /system.slice/docker-523caa1f03e9c96a6a12a55fb07df995c6e4b3a27e18585cbeda869b943ae728.scope/init.scope: No such file or directoryFailed to open pin file: No such file or directoryFailed to allocate manager object: No such file or directory[!!] Failed to allocate manager object.Exiting PID 1...* Does anyone know what might be causing this? Is it a systemd bug? I can copy the info into a GitHub issue if that's helpful. Thanks, Lewis