The following pull request was submitted through Github.
It can be accessed and reviewed at: https://github.com/lxc/lxd/pull/6997

This e-mail was sent by the LXC bot, direct replies will not reach the author
unless they happen to be subscribed to this list.

=== Description (from pull-request) ===
…axbytes

/* kernel.keys.maxbytes */
When all containers share the same id mapping they will share a keyring. But
since each container will get their own session key it will be appended to the
keyring which will thus grow quite large. Thus the limit needs to be bumped.

/* net.core.bpf_jit_limit */
When running a kernel on which has /proc/sys/net/core/bpf_jit_enable set to a
value other than 0 then seccomp will make use of the eBPF JIT compiler so each
container's seccomp filter will be charged against the eBPF JIT limit. Thus the
limit needs to be bumped significantly. Note that a lot of kernels have
CONFIG_BPF_JIT_ALWAYS_ON=y set as a hardening feature which means the
bpf_jit_enable value can't be changed to anything else and will be fixed at 1.

Cc: Tobias Schüring <tob...@raidboxes.de>
Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
From dd98b789ac950732b74aa1d645eba6a45819fa2a Mon Sep 17 00:00:00 2001
From: Christian Brauner <christian.brau...@ubuntu.com>
Date: Mon, 9 Mar 2020 12:20:54 +0100
Subject: [PATCH] [RFC]: production-setup: add net.core.bpf_jit_limit and
 kernel.keys.maxbytes
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

/* kernel.keys.maxbytes */
When all containers share the same id mapping they will share a keyring. But
since each container will get their own session key it will be appended to the
keyring which will thus grow quite large. Thus the limit needs to be bumped.

/* net.core.bpf_jit_limit */
When running a kernel on which has /proc/sys/net/core/bpf_jit_enable set to a
value other than 0 then seccomp will make use of the eBPF JIT compiler so each
container's seccomp filter will be charged against the eBPF JIT limit. Thus the
limit needs to be bumped significantly. Note that a lot of kernels have
CONFIG_BPF_JIT_ALWAYS_ON=y set as a hardening feature which means the
bpf_jit_enable value can't be changed to anything else and will be fixed at 1.

Cc: Tobias Schüring <tob...@raidboxes.de>
Signed-off-by: Christian Brauner <christian.brau...@ubuntu.com>
---
 doc/production-setup.md | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/doc/production-setup.md b/doc/production-setup.md
index d785334681..43c620c43f 100644
--- a/doc/production-setup.md
+++ b/doc/production-setup.md
@@ -34,16 +34,18 @@ root    | hard  | nofile  | 1048576   | unset     | maximum 
number of open files
 
 ### /etc/sysctl.conf
 
-Parameter                           | Value     | Default | Description
-:-----                              | :---      | :---    | :---
-fs.inotify.max\_queued\_events      | 1048576   | 16384   | This specifies an 
upper limit on the number of events that can be queued to the corresponding 
inotify instance. [1]
-fs.inotify.max\_user\_instances     | 1048576   | 128     | This specifies an 
upper limit on the number of inotify instances that can be created per real 
user ID. [1]
-fs.inotify.max\_user\_watches       | 1048576   | 8192    | This specifies an 
upper limit on the number of watches that can be created per real user ID. [1]
-vm.max\_map\_count                  | 262144    | 65530   | This file contains 
the maximum number of memory map areas a process may have. Memory map areas are 
used as a side-effect of calling malloc, directly by mmap and mprotect, and 
also when loading shared libraries.
-kernel.dmesg\_restrict              | 1         | 0       | This denies 
container access to the messages in the kernel ring buffer. Please note that 
this also will deny access to non-root users on the host system.
-net.ipv4.neigh.default.gc\_thresh3  | 8192      | 1024    | This is the 
maximum number of entries in ARP table (IPv4). You should increase this if you 
create over 1024 containers. Otherwise, you will get the error `neighbour: 
ndisc_cache: neighbor table overflow!` when the ARP table gets full and those 
containers will not be able to get a network configuration. [2]
-net.ipv6.neigh.default.gc\_thresh3  | 8192      | 1024    | This is the 
maximum number of entries in ARP table (IPv6). You should increase this if you 
plan to create over 1024 containers. Otherwise, you will get the error 
`neighbour: ndisc_cache: neighbor table overflow!` when the ARP table gets full 
and those containers will not be able to get a network configuration. [2]
-kernel.keys.maxkeys                 | 2000      | 200     | This is the 
maximum number of keys a non-root user can use, should be higher than the 
number of containers
+Parameter                           | Value     | Default   | Description
+:-----                              | :---      | :---      | :---
+fs.inotify.max\_queued\_events      | 1048576   | 16384     | This specifies 
an upper limit on the number of events that can be queued to the corresponding 
inotify instance. [1]
+fs.inotify.max\_user\_instances     | 1048576   | 128       | This specifies 
an upper limit on the number of inotify instances that can be created per real 
user ID. [1]
+fs.inotify.max\_user\_watches       | 1048576   | 8192      | This specifies 
an upper limit on the number of watches that can be created per real user ID. 
[1]
+vm.max\_map\_count                  | 262144    | 65530     | This file 
contains the maximum number of memory map areas a process may have. Memory map 
areas are used as a side-effect of calling malloc, directly by mmap and 
mprotect, and also when loading shared libraries.
+kernel.dmesg\_restrict              | 1         | 0         | This denies 
container access to the messages in the kernel ring buffer. Please note that 
this also will deny access to non-root users on the host system.
+net.ipv4.neigh.default.gc\_thresh3  | 8192      | 1024      | This is the 
maximum number of entries in ARP table (IPv4). You should increase this if you 
create over 1024 containers. Otherwise, you will get the error `neighbour: 
ndisc_cache: neighbor table overflow!` when the ARP table gets full and those 
containers will not be able to get a network configuration. [2]
+net.ipv6.neigh.default.gc\_thresh3  | 8192      | 1024      | This is the 
maximum number of entries in ARP table (IPv6). You should increase this if you 
plan to create over 1024 containers. Otherwise, you will get the error 
`neighbour: ndisc_cache: neighbor table overflow!` when the ARP table gets full 
and those containers will not be able to get a network configuration. [2]
+net.core.bpf_jit_limit              | ????????? | 264241152 | This is a limit 
on the size of eBPF JIT allocations which is usually set to PAGE_SIZE * 40000. 
When `/proc/sys/net/core/bpf_jit_enable` is set to a value other than `0` then 
`seccomp` will make use of the eBPF JIT compiler so each container's       
`seccomp` filter will be charged against the eBPF JIT limit
+kernel.keys.maxkeys                 | 2000      | 200       | This is the 
maximum number of keys a non-root user can use, should be higher than the 
number of containers
+kernel.keys.maxbytes                | ????????? | 20000     | This is the 
maximum size of the keyring non-root users can use, should be higher than the 
number of containers
 
 Then, reboot the server.
 
_______________________________________________
lxc-devel mailing list
lxc-devel@lists.linuxcontainers.org
http://lists.linuxcontainers.org/listinfo/lxc-devel

Reply via email to