Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-12 Thread Michal Koutný
On Thu, Jan 12, 2023 at 03:31:25PM +, Lewis Gaul  
wrote:
> Could you suggest commands to run to do this?

# systemd-analyze set-log-level debug
# logger MARK-BEGIN
# ...whatever restart commands
# ...wait for the failure
# logger MARK-END
# systemd-analyze set-log-level info
# journalctl -b | sed -n '/MARK-BEGIN/,/MARK-END/p'

> Should we be suspicious of the host systemd version and/or the fact that
> the host is in 'legacy' mode while the container (based on the systemd
> version being higher) is in 'hybrid' mode? Maybe we should try telling the
> container systemd to run in 'legacy' mode somehow?

I'd be wary of the legacy@host and {hybrid,unified}@container combo.
Also the old versions on the host could mean that the cgroup setup may
be buggy.
(I only have capacity to look into the recent code but the debug logs
above may show something obvious.)

Ideally, you should tell both host and container to run in the unified
mode ;-)

Michal


signature.asc
Description: Digital signature


Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-12 Thread Lewis Gaul
Another data point: I can reproduce on Ubuntu 18.04 host which has systemd
v237 in *hybrid* cgroup mode (assuming I've understood the definition of
hybrid, as per my previous email). So it's looking like it might be an
issue with interoperation between host and container systemd, introduced
somewhere between v239 and v245 for host systemd when the container is
running v245 (also seen with v244 and v249).

Thanks,
Lewis

On Thu, 12 Jan 2023 at 15:31, Lewis Gaul  wrote:

> Hey Michal,
>
> Thanks for the reply.
>
> > I'd suggest looking at debug level logs from the hosts systemd around
> the time of the container restart.
>
> Could you suggest commands to run to do this?
>
> > What is the host's systemd version and cgroup mode
> (legacy,hybrid,unified)? (I'm not sure what the distros in your original
> message referred to.)
>
> The issue has been seen on Centos 8.2 and 8.4 host distro, but not seen on
> Ubuntu 20.04. The former has systemd v239 and appears to be in 'legacy'
> cgroup mode (no /sys/fs/cgroup/unified cgroup2 mount), whereas the latter
> has systemd v245 and is in what I believe you'd refer to as 'hybrid' mode
> (with the /sys/fs/cgroup/unified cgroup2 mount).
>
> Should we be suspicious of the host systemd version and/or the fact that
> the host is in 'legacy' mode while the container (based on the systemd
> version being higher) is in 'hybrid' mode? Maybe we should try telling the
> container systemd to run in 'legacy' mode somehow?
>
> Thanks,
> Lewis
>
> On Thu, 12 Jan 2023 at 13:12, Michal Koutný  wrote:
>
>> Hello.
>>
>> On Tue, Jan 10, 2023 at 03:28:04PM +, Lewis Gaul <
>> lewis.g...@gmail.com> wrote:
>> > I can confirm that the container has permissions since executing a
>> 'mkdir'
>> > in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside
>> the
>> > container succeeds after the restart, so I have no idea why systemd is
>> not
>> > creating the 'init.scope/' dir.
>>
>> It looks like it could also be a race/deferred impact from host's systemd.
>>
>> > I notice that inside the container's systemd cgroup mount
>> > 'system.slice/' does exist, but 'user.slice/' also does not (both
>> > exist on normal boot). Is there any way I can find systemd logs that
>> > might indicate why the cgroup dir creation is failing?
>>
>> I'd suggest looking at debug level logs from the hosts systemd around
>> the time of the container restart.
>>
>>
>> > I could raise this with the podman team, but it seems more in the
>> systemd
>> > area given it's a systemd warning and I would expect systemd to be
>> creating
>> > this cgroup dir?
>>
>> What is the host's systemd version and cgroup mode
>> (legacy,hybrid,unified)? (I'm not sure what the distros in your original
>> message referred to.)
>>
>>
>> Thanks,
>> Michal
>>
>


Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-12 Thread Lewis Gaul
Hey Michal,

Thanks for the reply.

> I'd suggest looking at debug level logs from the hosts systemd around the
time of the container restart.

Could you suggest commands to run to do this?

> What is the host's systemd version and cgroup mode
(legacy,hybrid,unified)? (I'm not sure what the distros in your original
message referred to.)

The issue has been seen on Centos 8.2 and 8.4 host distro, but not seen on
Ubuntu 20.04. The former has systemd v239 and appears to be in 'legacy'
cgroup mode (no /sys/fs/cgroup/unified cgroup2 mount), whereas the latter
has systemd v245 and is in what I believe you'd refer to as 'hybrid' mode
(with the /sys/fs/cgroup/unified cgroup2 mount).

Should we be suspicious of the host systemd version and/or the fact that
the host is in 'legacy' mode while the container (based on the systemd
version being higher) is in 'hybrid' mode? Maybe we should try telling the
container systemd to run in 'legacy' mode somehow?

Thanks,
Lewis

On Thu, 12 Jan 2023 at 13:12, Michal Koutný  wrote:

> Hello.
>
> On Tue, Jan 10, 2023 at 03:28:04PM +, Lewis Gaul 
> wrote:
> > I can confirm that the container has permissions since executing a
> 'mkdir'
> > in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the
> > container succeeds after the restart, so I have no idea why systemd is
> not
> > creating the 'init.scope/' dir.
>
> It looks like it could also be a race/deferred impact from host's systemd.
>
> > I notice that inside the container's systemd cgroup mount
> > 'system.slice/' does exist, but 'user.slice/' also does not (both
> > exist on normal boot). Is there any way I can find systemd logs that
> > might indicate why the cgroup dir creation is failing?
>
> I'd suggest looking at debug level logs from the hosts systemd around
> the time of the container restart.
>
>
> > I could raise this with the podman team, but it seems more in the systemd
> > area given it's a systemd warning and I would expect systemd to be
> creating
> > this cgroup dir?
>
> What is the host's systemd version and cgroup mode
> (legacy,hybrid,unified)? (I'm not sure what the distros in your original
> message referred to.)
>
>
> Thanks,
> Michal
>


Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-12 Thread Michal Koutný
Hello.

On Tue, Jan 10, 2023 at 03:28:04PM +, Lewis Gaul  
wrote:
> I can confirm that the container has permissions since executing a 'mkdir'
> in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the
> container succeeds after the restart, so I have no idea why systemd is not
> creating the 'init.scope/' dir.

It looks like it could also be a race/deferred impact from host's systemd.

> I notice that inside the container's systemd cgroup mount
> 'system.slice/' does exist, but 'user.slice/' also does not (both
> exist on normal boot). Is there any way I can find systemd logs that
> might indicate why the cgroup dir creation is failing?

I'd suggest looking at debug level logs from the hosts systemd around
the time of the container restart.


> I could raise this with the podman team, but it seems more in the systemd
> area given it's a systemd warning and I would expect systemd to be creating
> this cgroup dir?

What is the host's systemd version and cgroup mode
(legacy,hybrid,unified)? (I'm not sure what the distros in your original
message referred to.)


Thanks,
Michal


signature.asc
Description: Digital signature


Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-10 Thread Lewis Gaul
I omitted one piece of information about running with --cgroupns=private
thinking it was unrelated, but actually it appears maybe it is related (and
perhaps highlights a variant of the issue that is seen on first-boot, not
only on container restart). Again (and what makes me think it's related), I
can reproduce this on a Centos host but not on Ubuntu (still with SELinux
in 'permissive' mode).

[root@localhost ~]# podman run -it --name ubuntu --privileged --cgroupns
private ubuntu-systemd
systemd 245.4-4ubuntu3.19 running in system mode. (+PAM +AUDIT +SELINUX
+IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL
+XZ +LZ4 +SECCOMP +BLKID +ELFUTI)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Ubuntu 20.04.5 LTS!

Set hostname to .


*Couldn't move remaining userspace processes, ignoring: Input/output
errorFailed to create compat systemd cgroup /system.slice: No such file or
directoryFailed to create compat systemd cgroup
/system.slice/system-getty.slice: No such file or directory*
[  OK  ] Created slice system-getty.slice.

*Failed to create compat systemd cgroup
/system.slice/system-modprobe.slice: No such file or directory*[  OK  ]
Created slice system-modprobe.slice.

*Failed to create compat systemd cgroup /user.slice: No such file or
directory*[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.

This first warning is coming from one of the same areas of code I linked in
my first email:
https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2967.

I see the same thing with '--cap-add sys_admin' instead of '--privileged',
and again seen with both docker and podman.

Thanks,
Lewis

On Tue, 10 Jan 2023 at 15:28, Lewis Gaul  wrote:

> I'm aware of the higher level of collaboration between podman and systemd
> compared to docker, hence primarily raising this issue from a podman angle.
>
> In privileged mode all mounts are read-write, so yes the container has
> write access to the cgroup filesystem. (Podman also ensures write access to
> the systemd cgroup subsystem mount in non-privileged mode by default).
>
> On first boot PID 1 can be found in
> /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/init.scope/cgroup.procs,
> whereas when the container restarts the 'init.scope/' directory does not
> exist and PID 1 is instead found in the parent (container root) cgroup
> /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/cgroup.procs
> (also reflected by /proc/1/cgroup). This is strange because systemd must be
> the one to create this cgroup dir in the initial boot, so I'm not sure why
> it wouldn't on subsequent boot?
>
> I can confirm that the container has permissions since executing a 'mkdir'
> in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the
> container succeeds after the restart, so I have no idea why systemd is not
> creating the 'init.scope/' dir. I notice that inside the container's
> systemd cgroup mount 'system.slice/' does exist, but 'user.slice/' also
> does not (both exist on normal boot). Is there any way I can find systemd
> logs that might indicate why the cgroup dir creation is failing?
>
> One final datapoint: the same is seen when using a private cgroup
> namespace (via 'podman run --cgroupns=private'), although then the error is
> then, as expected, "Failed to attach 1 to compat systemd cgroup
> /init.scope: No such file or directory".
>
> I could raise this with the podman team, but it seems more in the systemd
> area given it's a systemd warning and I would expect systemd to be creating
> this cgroup dir?
>
> Thanks,
> Lewis
>
> On Tue, 10 Jan 2023 at 14:48, Lennart Poettering 
> wrote:
>
>> On Di, 10.01.23 13:18, Lewis Gaul (lewis.g...@gmail.com) wrote:
>>
>> > Following 'setenforce 0' I still see the same issue (I was also
>> suspecting
>> > SELinux!).
>> >
>> > A few additional data points:
>> > - this was not seen when using systemd v230 inside the container
>> > - this is also seen on CentOS 8.4
>> > - this is seen under docker even if the container's cgroup driver is
>> > changed from 'cgroupfs' to 'systemd'
>>
>> docker is garbage. They are hostile towards running systemd inside
>> containers.
>>
>> podman upstream is a lot friendly, and apparently what everyone in OCI
>> is going towards these days.
>>
>> I have not much experience with podman though, and in particular not
>> old versions. Next step would probably be to look at what precisely
>> causes the permission issue, via strace.
>>
>> but did you make sure your container actually gets write access to the
>> cgroup trees?
>>
>> anyway, i'd recommend asking the podman community for help about this.
>>
>> Lennart
>>
>> --
>> Lennart Poettering, Berlin
>>
>


Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-10 Thread Lewis Gaul
I'm aware of the higher level of collaboration between podman and systemd
compared to docker, hence primarily raising this issue from a podman angle.

In privileged mode all mounts are read-write, so yes the container has
write access to the cgroup filesystem. (Podman also ensures write access to
the systemd cgroup subsystem mount in non-privileged mode by default).

On first boot PID 1 can be found in
/sys/fs/cgroup/systemd/machine.slice/libpod-.scope/init.scope/cgroup.procs,
whereas when the container restarts the 'init.scope/' directory does not
exist and PID 1 is instead found in the parent (container root) cgroup
/sys/fs/cgroup/systemd/machine.slice/libpod-.scope/cgroup.procs
(also reflected by /proc/1/cgroup). This is strange because systemd must be
the one to create this cgroup dir in the initial boot, so I'm not sure why
it wouldn't on subsequent boot?

I can confirm that the container has permissions since executing a 'mkdir'
in /sys/fs/cgroup/systemd/machine.slice/libpod-.scope/ inside the
container succeeds after the restart, so I have no idea why systemd is not
creating the 'init.scope/' dir. I notice that inside the container's
systemd cgroup mount 'system.slice/' does exist, but 'user.slice/' also
does not (both exist on normal boot). Is there any way I can find systemd
logs that might indicate why the cgroup dir creation is failing?

One final datapoint: the same is seen when using a private cgroup namespace
(via 'podman run --cgroupns=private'), although then the error is then, as
expected, "Failed to attach 1 to compat systemd cgroup /init.scope: No such
file or directory".

I could raise this with the podman team, but it seems more in the systemd
area given it's a systemd warning and I would expect systemd to be creating
this cgroup dir?

Thanks,
Lewis

On Tue, 10 Jan 2023 at 14:48, Lennart Poettering 
wrote:

> On Di, 10.01.23 13:18, Lewis Gaul (lewis.g...@gmail.com) wrote:
>
> > Following 'setenforce 0' I still see the same issue (I was also
> suspecting
> > SELinux!).
> >
> > A few additional data points:
> > - this was not seen when using systemd v230 inside the container
> > - this is also seen on CentOS 8.4
> > - this is seen under docker even if the container's cgroup driver is
> > changed from 'cgroupfs' to 'systemd'
>
> docker is garbage. They are hostile towards running systemd inside
> containers.
>
> podman upstream is a lot friendly, and apparently what everyone in OCI
> is going towards these days.
>
> I have not much experience with podman though, and in particular not
> old versions. Next step would probably be to look at what precisely
> causes the permission issue, via strace.
>
> but did you make sure your container actually gets write access to the
> cgroup trees?
>
> anyway, i'd recommend asking the podman community for help about this.
>
> Lennart
>
> --
> Lennart Poettering, Berlin
>


Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-10 Thread Lennart Poettering
On Di, 10.01.23 13:18, Lewis Gaul (lewis.g...@gmail.com) wrote:

> Following 'setenforce 0' I still see the same issue (I was also suspecting
> SELinux!).
>
> A few additional data points:
> - this was not seen when using systemd v230 inside the container
> - this is also seen on CentOS 8.4
> - this is seen under docker even if the container's cgroup driver is
> changed from 'cgroupfs' to 'systemd'

docker is garbage. They are hostile towards running systemd inside
containers.

podman upstream is a lot friendly, and apparently what everyone in OCI
is going towards these days.

I have not much experience with podman though, and in particular not
old versions. Next step would probably be to look at what precisely
causes the permission issue, via strace.

but did you make sure your container actually gets write access to the
cgroup trees?

anyway, i'd recommend asking the podman community for help about this.

Lennart

--
Lennart Poettering, Berlin


Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-10 Thread Lewis Gaul
Following 'setenforce 0' I still see the same issue (I was also suspecting
SELinux!).

A few additional data points:
- this was not seen when using systemd v230 inside the container
- this is also seen on CentOS 8.4
- this is seen under docker even if the container's cgroup driver is
changed from 'cgroupfs' to 'systemd'

Thanks,
Lewis

On Tue, 10 Jan 2023 at 11:12, Lennart Poettering 
wrote:

> On Mo, 09.01.23 19:45, Lewis Gaul (lewis.g...@gmail.com) wrote:
>
> > Hi all,
> >
> > I've come across an issue when restarting a systemd container, which I'm
> > seeing on a CentOS 8.2 VM but not able to reproduce on an Ubuntu 20.04 VM
> > (both cgroups v1).
>
> selinux?
>
> Lennart
>
> --
> Lennart Poettering, Berlin
>


Re: [systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-10 Thread Lennart Poettering
On Mo, 09.01.23 19:45, Lewis Gaul (lewis.g...@gmail.com) wrote:

> Hi all,
>
> I've come across an issue when restarting a systemd container, which I'm
> seeing on a CentOS 8.2 VM but not able to reproduce on an Ubuntu 20.04 VM
> (both cgroups v1).

selinux?

Lennart

--
Lennart Poettering, Berlin


[systemd-devel] Container restart issue: Failed to attach 1 to compat systemd cgroup

2023-01-09 Thread Lewis Gaul
Hi all,

I've come across an issue when restarting a systemd container, which I'm
seeing on a CentOS 8.2 VM but not able to reproduce on an Ubuntu 20.04 VM
(both cgroups v1).

The failure looks as follows, hitting the warning condition at
https://github.com/systemd/systemd/blob/v245/src/shared/cgroup-setup.c#L279:

[root@localhost ubuntu-systemd]# podman run -it --privileged --name ubuntu
--detach ubuntu-systemd
5e4ab2a36681c092f4ef937cf03b25a8d3d7b2fa530559bf4dac4079c84d0313

[root@localhost ubuntu-systemd]# podman restart ubuntu
5e4ab2a36681c092f4ef937cf03b25a8d3d7b2fa530559bf4dac4079c84d0313

[root@localhost ubuntu-systemd]# podman logs ubuntu | grep -B6 -A2 'Set
hostname'
systemd 245.4-4ubuntu3.19 running in system mode. (+PAM +AUDIT +SELINUX
+IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL
+XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2
default-hierarchy=hybrid)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Ubuntu 20.04.5 LTS!

Set hostname to <5e4ab2a36681>.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Created slice system-modprobe.slice.
--
systemd 245.4-4ubuntu3.19 running in system mode. (+PAM +AUDIT +SELINUX
+IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL
+XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2
default-hierarchy=hybrid)
Detected virtualization podman.
Detected architecture x86-64.

Welcome to Ubuntu 20.04.5 LTS!

Set hostname to <5e4ab2a36681>.

*Failed to attach 1 to compat systemd cgroup
/machine.slice/libpod-5e4ab2a36681c092f4ef937cf03b25a8d3d7b2fa530559bf4dac4079c84d0313.scope/init.scope:
No such file or directory*[  OK  ] Created slice system-getty.slice.


If using docker instead of podman (still on CentOS 8.2) the container
actually exits after restart (when hitting the code at
https://github.com/systemd/systemd/blob/v245/src/core/cgroup.c#L2972):

[root@localhost ubuntu-systemd]# docker logs ubuntu | grep -C5 'Set
hostname'
Detected virtualization docker.
Detected architecture x86-64.

Welcome to Ubuntu 20.04.5 LTS!

Set hostname to <523caa1f03e9>.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Created slice system-modprobe.slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
--
Detected virtualization docker.
Detected architecture x86-64.

Welcome to Ubuntu 20.04.5 LTS!

Set hostname to <523caa1f03e9>.




*Failed to attach 1 to compat systemd cgroup
/system.slice/docker-523caa1f03e9c96a6a12a55fb07df995c6e4b3a27e18585cbeda869b943ae728.scope/init.scope:
No such file or directoryFailed to open pin file: No such file or
directoryFailed to allocate manager object: No such file or
directory[!!] Failed to allocate manager object.Exiting PID 1...*


Does anyone know what might be causing this? Is it a systemd bug? I can
copy the info into a GitHub issue if that's helpful.

Thanks,
Lewis