The following pull request was submitted through Github. It can be accessed and reviewed at: https://github.com/lxc/lxcfs/pull/308
This e-mail was sent by the LXC bot, direct replies will not reach the author unless they happen to be subscribed to this list. === Description (from pull-request) === When we troubleshoot the problem of server network jitter at one time, the reason for the positioning is that the kernel is traversing the cgroup, and the main time is small in the memcg_stat_show function. Our kernel engineer obtains lxcfs by grabbing the stack and triggers this problem. After combining lxcfs code analysis and positioning, it is determined that the code disappears after reload lxcfs. kernel stack: ``` COMMAND: lxcfs PID: 3223863 LATENCY: 55ms trace_irqoff_record+0x12b/0x1b0 [trace_irqoff] trace_irqoff_hrtimer_handler+0x97/0x99 [trace_irqoff] __hrtimer_run_queues+0xdc/0x220 hrtimer_interrupt+0xa6/0x1f0 smp_apic_timer_interrupt+0x62/0x120 apic_timer_interrupt+0x7d/0x90 memcg_stat_show+0x27a/0x460 seq_read+0x11f/0x3f0 __vfs_read+0x33/0x160 vfs_read+0x91/0x130 SyS_read+0x52/0xc0 do_syscall_64+0x68/0x100 entry_SYSCALL_64_after_hwframe+0x3d/0xa2 ``` this pr fixed a problem with server stability degradation caused by cgroup dying traversal. This is a very bad problem. when initpid == 1, getting meminfo will traverses all the memory cgroup. a certain probability causes server stability to drop, such as network jitter. cgroup dying examples are as follows。userspace has 1810 memcgroup, but kernel has 110588 memory cgroup ``` root@XX-YY-ZZ-AA:~# lscgroup | grep memory | wc -l 1810 root@XX-YY-ZZ-AA:~# grep memory /proc/cgroups memory 7 110588 1 ```` more questions about cgroup dying. Can refer to https://lwn.net/Articles/787614/?spm=a2c4e.10696291.0.0.105919a4uX5P3F Signed-off-by: Hongbo Yin <yinhon...@bytedance.com>
From 5e0117a1c8eee3b0a844d7065de44daaaa81c4c0 Mon Sep 17 00:00:00 2001 From: Hongbo Yin <yinhon...@bytedance.com> Date: Wed, 9 Oct 2019 11:11:12 +0800 Subject: [PATCH] fixed a problem with server stability degradation caused by cgroup dying traversal. Signed-off-by: Hongbo Yin <yinhon...@bytedance.com> --- bindings.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/bindings.c b/bindings.c index 1811955..9a5a3df 100644 --- a/bindings.c +++ b/bindings.c @@ -3452,6 +3452,18 @@ static int proc_meminfo_read(char *buf, size_t size, off_t offset, pid_t initpid = lookup_initpid_in_store(fc->pid); if (initpid <= 0) initpid = fc->pid; + + /* + * fixed a problem with server stability degradation caused by cgroup dying traversal. This is a very bad problem. + * when initpid == 1, getting meminfo will traverses all the memory cgroup. + * a certain probability causes server stability to drop, such as network jitter. + * + * more questions about cgroup dying. + * Can refer to https://lwn.net/Articles/787614/?spm=a2c4e.10696291.0.0.105919a4uX5P3F + */ + if (initpid == 1) + return read_file("/proc/meminfo", buf, size, d); + cg = get_pid_cgroup(initpid, "memory"); if (!cg) return read_file("/proc/meminfo", buf, size, d);
_______________________________________________ lxc-devel mailing list lxc-devel@lists.linuxcontainers.org http://lists.linuxcontainers.org/listinfo/lxc-devel