Hey folks,

I've upgraded this system to buster and it seems that either the new
kernel (4.19.0-8-amd64) or new lxc version (1:3.1.0+really3.0.3-8) has
fixed this problem: I can now again re-exec systemd in containers even
with lxc.cap.drop = sys_admin enabled.

I guess this issue could be closed? Feel free to do so if you think it
is appropriate.


Anyway, below is some more info I collected a long time ago but never
gotten around to cleaning up and sending. I'm including it here, in case
it is useful for anyone else running into the same.

Gr.

Matthijs

== Old debugging info below ==

When running systemd with debug loglevel (in /etc/systemd/system.conf),
I see the following on boot (from the console logfile, since journald
isn't running at that point yet):

        Using cgroup controller name=systemd. File system hierarchy is at 
/sys/fs/cgroup/systemd.
        Release agent already installed.

When reexecuting systemd, I get the following (from journalctl):

        Using cgroup controller name=systemd. File system hierarchy is at 
/sys/fs/cgroup/systemd/../...
        Release agent already installed.
        Failed to create /../../init.scope control group: Operation not 
permitted
        Failed to allocate manager object: Operation not permitted


The ../../init.scope is, I think, based on this file:

        $ cat /proc/1/cgroup
        10:freezer:/
        9:pids:/../../init.scope
        8:net_cls,net_prio:/
        7:devices:/../../init.scope
        6:blkio:/../../init.scope
        5:memory:/../../init.scope
        4:perf_event:/
        3:cpu,cpuacct:/../../init.scope
        2:cpuset:/
        1:name=systemd:/../../init.scope

This is how it looks before and after the re-exec.

I'm not sure what this file looks like when systemd first starts in the
container, but I suspect the ../../ is not there yet, given the "File
system hierarchy is at /sys/fs/cgroup/systemd" log message, or maybe
systemd does not read it on initial startup?

On the host, the file looks like this:

        $ cat /proc/1/cgroup
        10:freezer:/
        9:pids:/init.scope
        8:net_cls,net_prio:/
        7:devices:/init.scope
        6:blkio:/init.scope
        5:memory:/init.scope
        4:perf_event:/
        3:cpu,cpuacct:/init.scope
        2:cpuset:/
        1:name=systemd:/init.scope

When I look up the container's pid 1 on the host, it looks like this:

matthijs@tika:/etc/lxc$ cat   /proc/1755/cgroup
10:freezer:/lxc/template
9:pids:/init.scope
8:net_cls,net_prio:/lxc/template
7:devices:/init.scope
6:blkio:/init.scope
5:memory:/init.scope
4:perf_event:/lxc/template
3:cpu,cpuacct:/init.scope
2:cpuset:/lxc/template
1:name=systemd:/init.scope


When I start the container *with* CAP_SYS_ADMIN, the file inside the
container looks different:

matthijs@template:~$ cat /proc/1/cgroup | grep systemd
1:name=systemd:/init.scope

When I look up the container's pid 1 on the host, it looks like this:

matthijs@tika:/etc/lxc$ sudo cat /proc/507/cgroup | grep systemd
1:name=systemd:/lxc/template/init.scope

== New debug info ==

After the upgrade to buster, it seems that the scopes are now correct.
Inside the container *without* CAP_SYS_ADMIN, I now get:

$ cat /proc/1/cgroup |grep systemd
1:name=systemd:/init.scope

Attachment: signature.asc
Description: PGP signature

Reply via email to