Bug#890824: Container: unsets cgroup memory limit on user login
On Mon, 29 Mar 2021 07:49:24 +0200 Maximilian Philipps wrote: hi Michael, I currently can't test that. Given that bullseye isn't released yet, I don't have a test environment here. When bullseye is released I will try to test it again, for time being I have moved all libvirt-lxc container to use lxc. Any updates here? Ideally, if you run bullseye and you still encounter the problem, install systemd v250 from bullseye-backports and if the problem persists, file it upstream at https://github.com/systemd/systemd/issues/ and report back with the issue number Regards, Michael OpenPGP_signature Description: OpenPGP digital signature
Bug#890824: Container: unsets cgroup memory limit on user login
hi Michael, I currently can't test that. Given that bullseye isn't released yet, I don't have a test environment here. When bullseye is released I will try to test it again, for time being I have moved all libvirt-lxc container to use lxc. Regards, Maximilian Philipps
Bug#890824: Container: unsets cgroup memory limit on user login
Hi Maximilian, can you please check, if you can still reproduce the issue on bullseye, where cgroupv2, i.e. unified, is the default cgroup hierarchy. Regards, Michael Am 25.10.2019 um 16:35 schrieb Maximilian Philipps: hi I can now reliably trigger the 8 exabyte issue. When I start a libvirt-lxc container, libvirts sets the memory limit. This can be seen with: cat /sys/fs/cgroup/memory/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope/memory.limit_in_bytes 2147483648 If I now call systemctl daemon-reload on the host the memory limit jumps to 9223372036854771712 I can prevent this with by setting MaxMemory for the scope on the host: systemctl set-property --runtime "machine-lxc\x2d27166\x2dhost.domain.tld.scope" MemoryMax=2147483648 I need to know the pid used in the machine name and therefor can really only set it at runtime. However this isn't enough to prevent the 8 exabyte issue. For some reason when I do a systemctl daemon-reload on the host systemd also changes cgroup membership of some processes. Prior to reloading there were 3 processes directly in the machine-lxc...scope. A /usr/lib/libvirt/libvirt_lxc process, the /sbin/init process of the container and other process that I can't find in /proc/. Maybe a pid from within the container? After reloading only the /sbin/init process remains in the scope, the libvirt_lxc process gets kicked back to the libvirtd.service cgroup and the "ghost" task disappears. Befor reload: 11:blkio:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 10:freezer:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 9:perf_event:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 8:pids:/system.slice/libvirtd.service 7:cpu,cpuacct:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 6:rdma:/ 5:devices:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 4:memory:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 2:cpuset:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 1:name=systemd:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 0::/system.slice/libvirtd.service After reload: 11:blkio:/system.slice/libvirtd.service 10:freezer:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 9:perf_event:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 8:pids:/system.slice/libvirtd.service 7:cpu,cpuacct:/system.slice/libvirtd.service 6:rdma:/ 5:devices:/system.slice/libvirtd.service 4:memory:/system.slice/libvirtd.service 3:net_cls,net_prio:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 2:cpuset:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 1:name=systemd:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 0::/system.slice/libvirtd.service
Bug#890824: Container: unsets cgroup memory limit on user login
hi I can now reliably trigger the 8 exabyte issue. When I start a libvirt-lxc container, libvirts sets the memory limit. This can be seen with: cat /sys/fs/cgroup/memory/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope/memory.limit_in_bytes 2147483648 If I now call systemctl daemon-reload on the host the memory limit jumps to 9223372036854771712 I can prevent this with by setting MaxMemory for the scope on the host: systemctl set-property --runtime "machine-lxc\x2d27166\x2dhost.domain.tld.scope" MemoryMax=2147483648 I need to know the pid used in the machine name and therefor can really only set it at runtime. However this isn't enough to prevent the 8 exabyte issue. For some reason when I do a systemctl daemon-reload on the host systemd also changes cgroup membership of some processes. Prior to reloading there were 3 processes directly in the machine-lxc...scope. A /usr/lib/libvirt/libvirt_lxc process, the /sbin/init process of the container and other process that I can't find in /proc/. Maybe a pid from within the container? After reloading only the /sbin/init process remains in the scope, the libvirt_lxc process gets kicked back to the libvirtd.service cgroup and the "ghost" task disappears. Befor reload: 11:blkio:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 10:freezer:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 9:perf_event:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 8:pids:/system.slice/libvirtd.service 7:cpu,cpuacct:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 6:rdma:/ 5:devices:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 4:memory:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 2:cpuset:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 1:name=systemd:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 0::/system.slice/libvirtd.service After reload: 11:blkio:/system.slice/libvirtd.service 10:freezer:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 9:perf_event:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 8:pids:/system.slice/libvirtd.service 7:cpu,cpuacct:/system.slice/libvirtd.service 6:rdma:/ 5:devices:/system.slice/libvirtd.service 4:memory:/system.slice/libvirtd.service 3:net_cls,net_prio:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 2:cpuset:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 1:name=systemd:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope 0::/system.slice/libvirtd.service
Bug#890824: Container: unsets cgroup memory limit on user login
hi, After digging a bit more, it appears that after the update from stretch to buster we are using some mix cgroupv1 and cgroupv2. /sys/fs/cgroup/ is still a tmpfs and /sys/fs/cgroup/unified/ exits, but hast no controllers. So apparently systemd should still use the controllers from v1 with the hierarchy from v2? Can anyone confirm the memory resource management works at all on buster?
Bug#890824: Container: unsets cgroup memory limit on user login
Recently updated one of the hosts and the containers running on it from stretch to buster. With buster's 241-7~deb10u1 the issue still exists. I have tried working around this issue by setting a memory limit on the -.slice from within the container, but this is fairly unreliable.
Bug#890824: Container: unsets cgroup memory limit on user login
Would you mind testing with systemd v239 from unstable/testing and eventually raise this upstream at https://github.com/systemd/systemd tbh, I'm not sure what the expected behaviour is in that regard and if this maybe just a configuration issue. -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#890824: Container: unsets cgroup memory limit on user login
On 02/19/2018 02:07 PM, Maximilian Philipps wrote: On 02/19/2018 01:50 PM, Michael Biebl wrote: Am 19.02.2018 um 13:09 schrieb Maximilian Philipps: Package: systemd Version: 232-25+deb9u1 Severity: important Hi I have an issue with Systemd unsetting the memory limit for my container, whereupon programs like free and htop report having access to 8 exabyte of memory. The setup is the following: Host: Release: Debian jessie Kernel: 4.9.65-3+deb9u2~bpo8+1 (jessie backports) Container provider: libvirt 3.0.0-4~bpo8+1 (jessie backports) Systemd: 215-17+deb8u7 (jessie) cgroup hierarchy: legacy Guest: Release: Debian stretch Systemd: 232-25+deb9u1 (stretch) There are several containers running on the host, but this problem only occurs with all the Debian stretch containers. Containers running Debian jessie or older Ubuntu 12.04 aren't affected. Each container is configured to cgroup enforced memory limit in it's libvirt domain file. Example: 4194304 2097152 Steps to reproduce + observations: 1) start a container with virsh -c lxc:// container.example.com 2) virsh -c lxc:// memtune container.example.com reports a hard_limit of 2097152 3) cat "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" outputs 2147483648 4) nsenter -t -m -u -i -n -p free reports 2097152 kB 5) ssh container.example.com free reports 9007199254740991 kB 3) cat "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" outputs 9223372036854771712 6) nsenter -t -m -u -i -n -p free reports 9007199254740991 kB 7) virsh -c lxc:// memtune container.example.com reports a hard_limit of unlimited As far as I can tell it seems to be that systemd unsets the cgroup memory limit when creating the user session. However why it gets set to 9223372036854771712 instead of the 255G of the host I don't know. I'm confused: Are you saying that systemd inside the guest (i.e. running systemd v232) resets the memory limits on the host (running v215)? No, the hosts still sees the 255GB. The systemd in the guest resets the limits for the container when someone logs in. In terms of the cgroup hierarchy /sys/fs/cgroup/memory/memory.limit_in_bytes is always 9223372036854771712, which appears to be treated as no restrictions on the host. However the memory.limit_in_bytes within the machine scope does change. On a second thought, maybe you assumed that the cgroup namespace is unshared? This is not the case, cgroup namespaces are fairly new and as far as I know not supported by libvirt-lxc.
Bug#890824: Container: unsets cgroup memory limit on user login
On 02/19/2018 01:50 PM, Michael Biebl wrote: Am 19.02.2018 um 13:09 schrieb Maximilian Philipps: Package: systemd Version: 232-25+deb9u1 Severity: important Hi I have an issue with Systemd unsetting the memory limit for my container, whereupon programs like free and htop report having access to 8 exabyte of memory. The setup is the following: Host: Release: Debian jessie Kernel: 4.9.65-3+deb9u2~bpo8+1 (jessie backports) Container provider: libvirt 3.0.0-4~bpo8+1 (jessie backports) Systemd: 215-17+deb8u7 (jessie) cgroup hierarchy: legacy Guest: Release: Debian stretch Systemd: 232-25+deb9u1 (stretch) There are several containers running on the host, but this problem only occurs with all the Debian stretch containers. Containers running Debian jessie or older Ubuntu 12.04 aren't affected. Each container is configured to cgroup enforced memory limit in it's libvirt domain file. Example: 4194304 2097152 Steps to reproduce + observations: 1) start a container with virsh -c lxc:// container.example.com 2) virsh -c lxc:// memtune container.example.com reports a hard_limit of 2097152 3) cat "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" outputs 2147483648 4) nsenter -t -m -u -i -n -p free reports 2097152 kB 5) ssh container.example.com free reports 9007199254740991 kB 3) cat "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" outputs 9223372036854771712 6) nsenter -t -m -u -i -n -p free reports 9007199254740991 kB 7) virsh -c lxc:// memtune container.example.com reports a hard_limit of unlimited As far as I can tell it seems to be that systemd unsets the cgroup memory limit when creating the user session. However why it gets set to 9223372036854771712 instead of the 255G of the host I don't know. I'm confused: Are you saying that systemd inside the guest (i.e. running systemd v232) resets the memory limits on the host (running v215)? No, the hosts still sees the 255GB. The systemd in the guest resets the limits for the container when someone logs in. In terms of the cgroup hierarchy /sys/fs/cgroup/memory/memory.limit_in_bytes is always 9223372036854771712, which appears to be treated as no restrictions on the host. However the memory.limit_in_bytes within the machine scope does change.
Bug#890824: Container: unsets cgroup memory limit on user login
Am 19.02.2018 um 13:09 schrieb Maximilian Philipps: > Package: systemd > Version: 232-25+deb9u1 > Severity: important > > Hi > > I have an issue with Systemd unsetting the memory limit for my container, > whereupon programs like free and htop report having access to 8 exabyte > of memory. > > The setup is the following: > > Host: > Release: Debian jessie > Kernel: 4.9.65-3+deb9u2~bpo8+1 (jessie backports) > Container provider: libvirt 3.0.0-4~bpo8+1 (jessie backports) > Systemd: 215-17+deb8u7 (jessie) > cgroup hierarchy: legacy > > Guest: > Release: Debian stretch > Systemd: 232-25+deb9u1 (stretch) > > There are several containers running on the host, but this problem only > occurs with all the Debian stretch containers. Containers running Debian > jessie or older Ubuntu 12.04 aren't affected. > Each container is configured to cgroup enforced memory limit in it's > libvirt domain file. > Example: > 4194304 > 2097152 > > Steps to reproduce + observations: > 1) start a container with virsh -c lxc:// container.example.com > 2) virsh -c lxc:// memtune container.example.com > reports a hard_limit of 2097152 > 3) cat > "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" > > outputs 2147483648 > 4) nsenter -t -m -u -i -n -p free reports 2097152 kB > 5) ssh container.example.com free reports 9007199254740991 kB > 3) cat > "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" > > outputs 9223372036854771712 > 6) nsenter -t -m -u -i -n -p free reports 9007199254740991 kB > 7) virsh -c lxc:// memtune container.example.com > reports a hard_limit of unlimited > > As far as I can tell it seems to be that systemd unsets the cgroup memory > limit when creating the user session. However why it gets set to > 9223372036854771712 instead of the 255G of the host I don't know. I'm confused: Are you saying that systemd inside the guest (i.e. running systemd v232) resets the memory limits on the host (running v215)? -- Why is it that all of the instruments seeking intelligent life in the universe are pointed away from Earth? signature.asc Description: OpenPGP digital signature
Bug#890824: Container: unsets cgroup memory limit on user login
Package: systemd Version: 232-25+deb9u1 Severity: important Hi I have an issue with Systemd unsetting the memory limit for my container, whereupon programs like free and htop report having access to 8 exabyte of memory. The setup is the following: Host: Release: Debian jessie Kernel: 4.9.65-3+deb9u2~bpo8+1 (jessie backports) Container provider: libvirt 3.0.0-4~bpo8+1 (jessie backports) Systemd: 215-17+deb8u7 (jessie) cgroup hierarchy: legacy Guest: Release: Debian stretch Systemd: 232-25+deb9u1 (stretch) There are several containers running on the host, but this problem only occurs with all the Debian stretch containers. Containers running Debian jessie or older Ubuntu 12.04 aren't affected. Each container is configured to cgroup enforced memory limit in it's libvirt domain file. Example: 4194304 2097152 Steps to reproduce + observations: 1) start a container with virsh -c lxc:// container.example.com 2) virsh -c lxc:// memtune container.example.com reports a hard_limit of 2097152 3) cat "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" outputs 2147483648 4) nsenter -t -m -u -i -n -p free reports 2097152 kB 5) ssh container.example.com free reports 9007199254740991 kB 3) cat "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" outputs 9223372036854771712 6) nsenter -t -m -u -i -n -p free reports 9007199254740991 kB 7) virsh -c lxc:// memtune container.example.com reports a hard_limit of unlimited As far as I can tell it seems to be that systemd unsets the cgroup memory limit when creating the user session. However why it gets set to 9223372036854771712 instead of the 255G of the host I don't know. In any case I am looking forward to a better solution than resetting the limits through cron every minute. -- Package-specific info: -- System Information: Debian Release: 9.3 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-0.bpo.5-amd64 (SMP w/32 CPU cores) Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set to en_US.UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages systemd depends on: ii adduser 3.115 ii libacl1 2.2.52-3+b1 ii libapparmor1 2.11.0-3 ii libaudit1 1:2.6.7-2 ii libblkid1 2.29.2-1 ii libc6 2.24-11+deb9u1 ii libcap2 1:2.25-1 ii libcryptsetup4 2:1.7.3-4 ii libgcrypt20 1.7.6-2+deb9u2 ii libgpg-error0 1.26-2 ii libidn11 1.33-1 ii libip4tc0 1.6.0+snapshot20161117-6 ii libkmod2 23-2 ii liblz4-1 0.0~r131-2+b1 ii liblzma5 5.2.2-1.2+b1 ii libmount1 2.29.2-1 ii libpam0g 1.1.8-3.6 ii libseccomp2 2.3.1-2.1 ii libselinux1 2.6-3+b3 ii libsystemd0 232-25+deb9u1 ii mount 2.29.2-1 ii procps 2:3.3.12-3 ii util-linux 2.29.2-1 Versions of packages systemd recommends: ii dbus 1.10.24-0+deb9u1 ii libpam-systemd 232-25+deb9u1 Versions of packages systemd suggests: pn policykit-1 pn systemd-container pn systemd-ui Versions of packages systemd is related to: pn dracut pn initramfs-tools ii udev 232-25+deb9u1 -- no debconf information