Bug#890824: Container: unsets cgroup memory limit on user login

2022-07-05 Thread Michael Biebl
On Mon, 29 Mar 2021 07:49:24 +0200 Maximilian Philipps 
 wrote:

hi Michael,

I currently can't test that. Given that bullseye isn't released yet, I 
don't have a test environment here.


When bullseye is released I will try to test it again, for time being I 
have moved all libvirt-lxc container to use lxc.


Any updates here?
Ideally, if you run bullseye and you still encounter the problem, 
install systemd v250 from bullseye-backports and if the problem 
persists, file it upstream at https://github.com/systemd/systemd/issues/

and report back with the issue number

Regards,
Michael



OpenPGP_signature
Description: OpenPGP digital signature


Bug#890824: Container: unsets cgroup memory limit on user login

2021-03-29 Thread Maximilian Philipps

hi Michael,

I currently can't test that. Given that bullseye isn't released yet, I 
don't have a test environment here.


When bullseye is released I will try to test it again, for time being I 
have moved all libvirt-lxc container to use lxc.


Regards,

Maximilian Philipps



Bug#890824: Container: unsets cgroup memory limit on user login

2021-03-27 Thread Michael Biebl

Hi Maximilian,

can you please check, if you can still reproduce the issue on bullseye, 
where cgroupv2, i.e. unified, is the default cgroup hierarchy.


Regards,
Michael

Am 25.10.2019 um 16:35 schrieb Maximilian Philipps:

hi

I can now reliably trigger the 8 exabyte issue. When I start a
libvirt-lxc container, libvirts sets the memory limit.

This can be seen with:

cat
/sys/fs/cgroup/memory/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope/memory.limit_in_bytes

2147483648

If I now call systemctl daemon-reload on the host the memory limit jumps  to

9223372036854771712

I can prevent this with by setting MaxMemory for the scope on the host:

systemctl set-property --runtime
"machine-lxc\x2d27166\x2dhost.domain.tld.scope" MemoryMax=2147483648

I need to know the pid used in the machine name and therefor can really
only set it at runtime.

However this isn't enough to prevent the 8 exabyte issue. For some
reason when I do a systemctl daemon-reload on the host systemd also
changes cgroup membership of some processes. Prior to reloading there
were 3 processes directly in the machine-lxc...scope. A
/usr/lib/libvirt/libvirt_lxc process, the /sbin/init process of the
container and other process that I can't find in /proc/. Maybe a pid
from within the container?

After reloading only the /sbin/init process remains in the scope, the
libvirt_lxc process gets kicked back to the libvirtd.service cgroup and
the "ghost" task disappears.

Befor reload:

11:blkio:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
10:freezer:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
9:perf_event:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
8:pids:/system.slice/libvirtd.service
7:cpu,cpuacct:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
6:rdma:/
5:devices:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
4:memory:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
2:cpuset:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
1:name=systemd:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
0::/system.slice/libvirtd.service

After reload:

11:blkio:/system.slice/libvirtd.service
10:freezer:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
9:perf_event:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
8:pids:/system.slice/libvirtd.service
7:cpu,cpuacct:/system.slice/libvirtd.service
6:rdma:/
5:devices:/system.slice/libvirtd.service
4:memory:/system.slice/libvirtd.service
3:net_cls,net_prio:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
2:cpuset:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
1:name=systemd:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
0::/system.slice/libvirtd.service





Bug#890824: Container: unsets cgroup memory limit on user login

2019-10-25 Thread Maximilian Philipps
hi

I can now reliably trigger the 8 exabyte issue. When I start a
libvirt-lxc container, libvirts sets the memory limit.

This can be seen with:

cat
/sys/fs/cgroup/memory/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope/memory.limit_in_bytes

2147483648

If I now call systemctl daemon-reload on the host the memory limit jumps  to

9223372036854771712

I can prevent this with by setting MaxMemory for the scope on the host:

systemctl set-property --runtime
"machine-lxc\x2d27166\x2dhost.domain.tld.scope" MemoryMax=2147483648

I need to know the pid used in the machine name and therefor can really
only set it at runtime.

However this isn't enough to prevent the 8 exabyte issue. For some
reason when I do a systemctl daemon-reload on the host systemd also
changes cgroup membership of some processes. Prior to reloading there
were 3 processes directly in the machine-lxc...scope. A
/usr/lib/libvirt/libvirt_lxc process, the /sbin/init process of the
container and other process that I can't find in /proc/. Maybe a pid
from within the container?

After reloading only the /sbin/init process remains in the scope, the
libvirt_lxc process gets kicked back to the libvirtd.service cgroup and
the "ghost" task disappears.

Befor reload:

11:blkio:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
10:freezer:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
9:perf_event:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
8:pids:/system.slice/libvirtd.service
7:cpu,cpuacct:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
6:rdma:/
5:devices:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
4:memory:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
2:cpuset:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
1:name=systemd:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
0::/system.slice/libvirtd.service

After reload:

11:blkio:/system.slice/libvirtd.service
10:freezer:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
9:perf_event:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
8:pids:/system.slice/libvirtd.service
7:cpu,cpuacct:/system.slice/libvirtd.service
6:rdma:/
5:devices:/system.slice/libvirtd.service
4:memory:/system.slice/libvirtd.service
3:net_cls,net_prio:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
2:cpuset:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
1:name=systemd:/machine.slice/machine-lxc\x2d27166\x2dhost.domain.tld.scope
0::/system.slice/libvirtd.service



Bug#890824: Container: unsets cgroup memory limit on user login

2019-10-25 Thread Maximilian Philipps
hi,

After digging a bit more, it appears that after the update from stretch
to buster we are using some mix cgroupv1 and cgroupv2.

/sys/fs/cgroup/ is still a tmpfs and /sys/fs/cgroup/unified/ exits, but
hast no controllers. So apparently systemd should still use the
controllers from v1 with the hierarchy from v2?


Can anyone confirm the memory resource management works at all on buster?



Bug#890824: Container: unsets cgroup memory limit on user login

2019-10-25 Thread Maximilian Philipps
Recently updated one of the hosts and the containers running on it from
stretch to buster.

With buster's 241-7~deb10u1 the issue still exists. I have tried working
around this issue by setting a memory limit on the -.slice from within
the container, but this is fairly unreliable.



Bug#890824: Container: unsets cgroup memory limit on user login

2018-09-08 Thread Michael Biebl
Would you mind testing with systemd v239 from unstable/testing and
eventually raise this upstream at https://github.com/systemd/systemd

tbh, I'm not sure what the expected behaviour is in that regard and if
this maybe just a configuration issue.

-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#890824: Container: unsets cgroup memory limit on user login

2018-02-19 Thread Maximilian Philipps



On 02/19/2018 02:07 PM, Maximilian Philipps wrote:



On 02/19/2018 01:50 PM, Michael Biebl wrote:

Am 19.02.2018 um 13:09 schrieb Maximilian Philipps:

Package: systemd
Version: 232-25+deb9u1
Severity: important

Hi

I have an issue with Systemd unsetting the memory limit for my 
container,

whereupon programs like free and htop report having access to 8 exabyte
of memory.

The setup is the following:

Host:
Release: Debian jessie
Kernel: 4.9.65-3+deb9u2~bpo8+1 (jessie backports)
Container provider: libvirt 3.0.0-4~bpo8+1 (jessie backports)
Systemd: 215-17+deb8u7 (jessie)
cgroup hierarchy: legacy

Guest:
Release: Debian stretch
Systemd: 232-25+deb9u1 (stretch)

There are several containers running on the host, but this problem only
occurs with all the Debian stretch containers. Containers running 
Debian

jessie or older Ubuntu 12.04 aren't affected.
Each container is configured to cgroup enforced memory limit in it's
libvirt domain file.
Example:
4194304
2097152

Steps to reproduce + observations:
1) start a container with virsh -c lxc:// container.example.com
2) virsh -c lxc:// memtune container.example.com
    reports a hard_limit of 2097152
3) cat
"/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" 



outputs 2147483648
4) nsenter -t  -m -u -i -n -p free  reports 2097152 kB
5) ssh container.example.com free  reports 9007199254740991 kB
3) cat
"/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes" 



outputs 9223372036854771712
6) nsenter -t  -m -u -i -n -p free  reports 9007199254740991 kB
7) virsh -c lxc:// memtune container.example.com
    reports a hard_limit of unlimited

As far as I can tell it seems to be that systemd unsets the cgroup 
memory

limit when creating the user session. However why it gets set to
9223372036854771712 instead of the 255G of the host I don't know.

I'm confused: Are you saying that systemd inside the guest (i.e. running
systemd v232) resets the memory limits on the host (running v215)?



No, the hosts still sees the 255GB. The systemd in the guest resets
the limits for the container when someone logs in.
In terms of the cgroup hierarchy 
/sys/fs/cgroup/memory/memory.limit_in_bytes

is always 9223372036854771712, which appears to be treated as no
 restrictions on the host.
However the memory.limit_in_bytes within the machine scope does change.
On a second thought, maybe you assumed that the cgroup namespace is 
unshared?
This is not the case, cgroup namespaces are fairly new and as far as I 
know not supported

by libvirt-lxc.



Bug#890824: Container: unsets cgroup memory limit on user login

2018-02-19 Thread Maximilian Philipps



On 02/19/2018 01:50 PM, Michael Biebl wrote:

Am 19.02.2018 um 13:09 schrieb Maximilian Philipps:

Package: systemd
Version: 232-25+deb9u1
Severity: important

Hi

I have an issue with Systemd unsetting the memory limit for my container,
whereupon programs like free and htop report having access to 8 exabyte
of memory.

The setup is the following:

Host:
Release: Debian jessie
Kernel: 4.9.65-3+deb9u2~bpo8+1 (jessie backports)
Container provider: libvirt 3.0.0-4~bpo8+1 (jessie backports)
Systemd: 215-17+deb8u7 (jessie)
cgroup hierarchy: legacy

Guest:
Release: Debian stretch
Systemd: 232-25+deb9u1 (stretch)

There are several containers running on the host, but this problem only
occurs with all the Debian stretch containers. Containers running Debian
jessie or older Ubuntu 12.04 aren't affected.
Each container is configured to cgroup enforced memory limit in it's
libvirt domain file.
Example:
4194304
2097152

Steps to reproduce + observations:
1) start a container with virsh -c lxc:// container.example.com
2) virsh -c lxc:// memtune container.example.com
    reports a hard_limit of 2097152
3) cat
"/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes"

outputs 2147483648
4) nsenter -t  -m -u -i -n -p free  reports 2097152 kB
5) ssh container.example.com free  reports 9007199254740991 kB
3) cat
"/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes"

outputs 9223372036854771712
6) nsenter -t  -m -u -i -n -p free  reports 9007199254740991 kB
7) virsh -c lxc:// memtune container.example.com
    reports a hard_limit of unlimited

As far as I can tell it seems to be that systemd unsets the cgroup memory
limit when creating the user session. However why it gets set to
9223372036854771712 instead of the 255G of the host I don't know.

I'm confused: Are you saying that systemd inside the guest (i.e. running
systemd v232) resets the memory limits on the host (running v215)?



No, the hosts still sees the 255GB. The systemd in the guest resets
the limits for the container when someone logs in.
In terms of the cgroup hierarchy /sys/fs/cgroup/memory/memory.limit_in_bytes
is always 9223372036854771712, which appears to be treated as no
 restrictions on the host.
However the memory.limit_in_bytes within the machine scope does change.



Bug#890824: Container: unsets cgroup memory limit on user login

2018-02-19 Thread Michael Biebl
Am 19.02.2018 um 13:09 schrieb Maximilian Philipps:
> Package: systemd
> Version: 232-25+deb9u1
> Severity: important
> 
> Hi
> 
> I have an issue with Systemd unsetting the memory limit for my container,
> whereupon programs like free and htop report having access to 8 exabyte
> of memory.
> 
> The setup is the following:
> 
> Host:
> Release: Debian jessie
> Kernel: 4.9.65-3+deb9u2~bpo8+1 (jessie backports)
> Container provider: libvirt 3.0.0-4~bpo8+1 (jessie backports)
> Systemd: 215-17+deb8u7 (jessie)
> cgroup hierarchy: legacy
> 
> Guest:
> Release: Debian stretch
> Systemd: 232-25+deb9u1 (stretch)
> 
> There are several containers running on the host, but this problem only
> occurs with all the Debian stretch containers. Containers running Debian
> jessie or older Ubuntu 12.04 aren't affected.
> Each container is configured to cgroup enforced memory limit in it's
> libvirt domain file.
> Example:
> 4194304
> 2097152
> 
> Steps to reproduce + observations:
> 1) start a container with virsh -c lxc:// container.example.com
> 2) virsh -c lxc:// memtune container.example.com
>    reports a hard_limit of 2097152
> 3) cat
> "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes"
> 
> outputs 2147483648
> 4) nsenter -t  -m -u -i -n -p free  reports 2097152 kB
> 5) ssh container.example.com free  reports 9007199254740991 kB
> 3) cat
> "/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes"
> 
> outputs 9223372036854771712
> 6) nsenter -t  -m -u -i -n -p free  reports 9007199254740991 kB
> 7) virsh -c lxc:// memtune container.example.com
>    reports a hard_limit of unlimited
> 
> As far as I can tell it seems to be that systemd unsets the cgroup memory
> limit when creating the user session. However why it gets set to
> 9223372036854771712 instead of the 255G of the host I don't know.

I'm confused: Are you saying that systemd inside the guest (i.e. running
systemd v232) resets the memory limits on the host (running v215)?


-- 
Why is it that all of the instruments seeking intelligent life in the
universe are pointed away from Earth?



signature.asc
Description: OpenPGP digital signature


Bug#890824: Container: unsets cgroup memory limit on user login

2018-02-19 Thread Maximilian Philipps

Package: systemd
Version: 232-25+deb9u1
Severity: important

Hi

I have an issue with Systemd unsetting the memory limit for my container,
whereupon programs like free and htop report having access to 8 exabyte
of memory.

The setup is the following:

Host:
Release: Debian jessie
Kernel: 4.9.65-3+deb9u2~bpo8+1 (jessie backports)
Container provider: libvirt 3.0.0-4~bpo8+1 (jessie backports)
Systemd: 215-17+deb8u7 (jessie)
cgroup hierarchy: legacy

Guest:
Release: Debian stretch
Systemd: 232-25+deb9u1 (stretch)

There are several containers running on the host, but this problem only
occurs with all the Debian stretch containers. Containers running Debian
jessie or older Ubuntu 12.04 aren't affected.
Each container is configured to cgroup enforced memory limit in it's
libvirt domain file.
Example:
4194304
2097152

Steps to reproduce + observations:
1) start a container with virsh -c lxc:// container.example.com
2) virsh -c lxc:// memtune container.example.com
   reports a hard_limit of 2097152
3) cat
"/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes"
outputs 2147483648
4) nsenter -t  -m -u -i -n -p free  reports 2097152 kB
5) ssh container.example.com free  reports 9007199254740991 kB
3) cat
"/sys/fs/cgroup/memory/machine.slice/machine-.scope/memory.limit_in_bytes"
outputs 9223372036854771712
6) nsenter -t  -m -u -i -n -p free  reports 9007199254740991 kB
7) virsh -c lxc:// memtune container.example.com
   reports a hard_limit of unlimited

As far as I can tell it seems to be that systemd unsets the cgroup memory
limit when creating the user session. However why it gets set to
9223372036854771712 instead of the 255G of the host I don't know.


In any case I am looking forward to a better solution than resetting the
limits through cron every minute.

-- Package-specific info:

-- System Information:
Debian Release: 9.3
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 4.9.0-0.bpo.5-amd64 (SMP w/32 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: 
LC_ALL set to en_US.UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) 
(ignored: LC_ALL set to en_US.UTF-8)

Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages systemd depends on:
ii  adduser 3.115
ii  libacl1 2.2.52-3+b1
ii  libapparmor1    2.11.0-3
ii  libaudit1   1:2.6.7-2
ii  libblkid1   2.29.2-1
ii  libc6   2.24-11+deb9u1
ii  libcap2 1:2.25-1
ii  libcryptsetup4  2:1.7.3-4
ii  libgcrypt20 1.7.6-2+deb9u2
ii  libgpg-error0   1.26-2
ii  libidn11    1.33-1
ii  libip4tc0   1.6.0+snapshot20161117-6
ii  libkmod2    23-2
ii  liblz4-1    0.0~r131-2+b1
ii  liblzma5    5.2.2-1.2+b1
ii  libmount1   2.29.2-1
ii  libpam0g    1.1.8-3.6
ii  libseccomp2 2.3.1-2.1
ii  libselinux1 2.6-3+b3
ii  libsystemd0 232-25+deb9u1
ii  mount   2.29.2-1
ii  procps  2:3.3.12-3
ii  util-linux  2.29.2-1

Versions of packages systemd recommends:
ii  dbus    1.10.24-0+deb9u1
ii  libpam-systemd  232-25+deb9u1

Versions of packages systemd suggests:
pn  policykit-1    
pn  systemd-container  
pn  systemd-ui 

Versions of packages systemd is related to:
pn  dracut   
pn  initramfs-tools  
ii  udev 232-25+deb9u1

-- no debconf information