[Devel] [PATCH RHEL8 COMMIT] kernel/sched/fair.c: Add missing update_rq_clock() calls
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh8-4.18.0-193.6.3.vz8.4.9 --> commit 15645110b30affdf50f83b458724c8b503224aa4 Author: Andrey Ryabinin Date: Mon Sep 28 18:46:06 2020 +0300 kernel/sched/fair.c: Add missing update_rq_clock() calls We've got a hard lockup which seems to be caused by mgag200 console printk code calling to schedule_work from scheduler with rq->lock held: #5 [b79e034239a8] native_queued_spin_lock_slowpath at 8b50c6c6 #6 [b79e034239a8] _raw_spin_lock at 8bc96e5c #7 [b79e034239b0] try_to_wake_up at 8b4e26ff #8 [b79e03423a10] __queue_work at 8b4ce3f3 #9 [b79e03423a58] queue_work_on at 8b4ce714 The printk called because assert_clock_updated() triggered SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP); This means that we missing necessary update_rq_clock() call. Add one to cpulimit_balance_cpu_stop() to fix the warning. Also add one in load_balance() before move_task_groups() call. It seems to be another place missing this call. https://jira.sw.ru/browse/PSBM-108013 Signed-off-by: Andrey Ryabinin --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5d3556b15e70..e6dc21d5fa03 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data) schedstat_inc(sd->clb_count); + update_rq_clock(rq); if (do_cpulimit_balance(&env)) schedstat_inc(sd->clb_pushed); else @@ -9176,6 +9177,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, env.loop = 0; local_irq_save(rf.flags); double_rq_lock(env.dst_rq, busiest); + update_rq_clock(env.dst_rq); cur_ld_moved = ld_moved = move_task_groups(&env); double_rq_unlock(env.dst_rq, busiest); local_irq_restore(rf.flags); ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL8 COMMIT] ms/mm: mempolicy: require at least one nodeid for MPOL_PREFERRED
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh8-4.18.0-193.6.3.vz8.4.9 --> commit 56a5186d29d7373d71fd1519d90b785585840a2f Author: Randy Dunlap Date: Wed Apr 1 21:10:58 2020 -0700 ms/mm: mempolicy: require at least one nodeid for MPOL_PREFERRED Using an empty (malformed) nodelist that is not caught during mount option parsing leads to a stack-out-of-bounds access. The option string that was used was: "mpol=prefer:,". However, MPOL_PREFERRED requires a single node number, which is not being provided here. Add a check that 'nodes' is not empty after parsing for MPOL_PREFERRED's nodeid. Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display") Reported-by: Entropy Moe <3ntr0py1...@gmail.com> Reported-by: syzbot+b055b1a6b2b958707...@syzkaller.appspotmail.com Signed-off-by: Randy Dunlap Signed-off-by: Andrew Morton Tested-by: syzbot+b055b1a6b2b958707...@syzkaller.appspotmail.com Cc: Lee Schermerhorn Link: http://lkml.kernel.org/r/89526377-7eb6-b662-e1d8-4430928ab...@infradead.org Signed-off-by: Linus Torvalds https://jira.sw.ru/browse/PSBM-120642 CVE-2020-11565: out-of-bounds write in mpol_parse_str function in mm/mempolicy.c (cherry picked from commit aa9f7d5172fac9bf1f09e678c35e287a40a7b7dd) Signed-off-by: Konstantin Khorenko --- mm/mempolicy.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index dab3b16534f0..b24738101414 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -2797,7 +2797,9 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) switch (mode) { case MPOL_PREFERRED: /* -* Insist on a nodelist of one node only +* Insist on a nodelist of one node only, although later +* we use first_node(nodes) to grab a single node, so here +* nodelist (or nodes) cannot be empty. */ if (nodelist) { char *rest = nodelist; @@ -2805,6 +2807,8 @@ int mpol_parse_str(char *str, struct mempolicy **mpol) rest++; if (*rest) goto out; + if (nodes_empty(nodes)) + goto out; } break; case MPOL_INTERLEAVE: ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH v2 vz8] kernel/sched/fair.c: Add missing update_rq_clock() calls
We've got a hard lockup which seems to be caused by mgag200 console printk code calling to schedule_work from scheduler with rq->lock held: #5 [b79e034239a8] native_queued_spin_lock_slowpath at 8b50c6c6 #6 [b79e034239a8] _raw_spin_lock at 8bc96e5c #7 [b79e034239b0] try_to_wake_up at 8b4e26ff #8 [b79e03423a10] __queue_work at 8b4ce3f3 #9 [b79e03423a58] queue_work_on at 8b4ce714 #10 [b79e03423a68] mga_imageblit at c026d666 [mgag200] #11 [b79e03423a80] soft_cursor at 8b8a9d84 #12 [b79e03423ad8] bit_cursor at 8b8a99b2 #13 [b79e03423ba0] hide_cursor at 8b93bc7a #14 [b79e03423bb0] vt_console_print at 8b93e07d #15 [b79e03423c18] console_unlock at 8b518f0e #16 [b79e03423c68] vprintk_emit_log at 8b51acf7 #17 [b79e03423cc0] vprintk_default at 8b51adcd #18 [b79e03423cd0] printk at 8b51b3d6 #19 [b79e03423d30] __warn_printk at 8b4b13a0 #20 [b79e03423d98] assert_clock_updated at 8b4dd293 #21 [b79e03423da0] deactivate_task at 8b4e12d1 #22 [b79e03423dc8] move_task_group at 8b4eaa5b #23 [b79e03423e00] cpulimit_balance_cpu_stop at 8b4f02f3 #24 [b79e03423eb0] cpu_stopper_thread at 8b576b67 #25 [b79e03423ee8] smpboot_thread_fn at 8b4d9125 #26 [b79e03423f10] kthread at 8b4d4fc2 #27 [b79e03423f50] ret_from_fork at 8be00255 The printk called because assert_clock_updated() triggered SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP); This means that we missing necessary update_rq_clock() call. Add one to cpulimit_balance_cpu_stop() to fix the warning. Also add one in load_balance() before move_task_groups() call. It seems to be another place missing this call. https://jira.sw.ru/browse/PSBM-108013 Signed-off-by: Andrey Ryabinin --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5d3556b15e70..e6dc21d5fa03 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data) schedstat_inc(sd->clb_count); + update_rq_clock(rq); if (do_cpulimit_balance(&env)) schedstat_inc(sd->clb_pushed); else @@ -9176,6 +9177,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, env.loop = 0; local_irq_save(rf.flags); double_rq_lock(env.dst_rq, busiest); + update_rq_clock(env.dst_rq); cur_ld_moved = ld_moved = move_task_groups(&env); double_rq_unlock(env.dst_rq, busiest); local_irq_restore(rf.flags); -- 2.26.2 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH vz8] kernel/sched/fair.c: Add missing update_rq_clock() calls
We've got a hard lockup which seems to be caused by mgag200 console printk code calling to schedule_work from scheduler with rq->lock held: #5 [b79e034239a8] native_queued_spin_lock_slowpath at 8b50c6c6 #6 [b79e034239a8] _raw_spin_lock at 8bc96e5c #7 [b79e034239b0] try_to_wake_up at 8b4e26ff #8 [b79e03423a10] __queue_work at 8b4ce3f3 #9 [b79e03423a58] queue_work_on at 8b4ce714 The printk called because assert_clock_updated() triggered SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP); This means that we missing necessary update_rq_clock() call. Add one to cpulimit_balance_cpu_stop() to fix the warning. Also add one in load_balance() before move_task_groups() call. It seems to be another place missing this call. https://jira.sw.ru/browse/PSBM-108013 Signed-off-by: Andrey Ryabinin --- kernel/sched/fair.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 5d3556b15e70..e6dc21d5fa03 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data) schedstat_inc(sd->clb_count); + update_rq_clock(rq); if (do_cpulimit_balance(&env)) schedstat_inc(sd->clb_pushed); else @@ -9176,6 +9177,7 @@ static int load_balance(int this_cpu, struct rq *this_rq, env.loop = 0; local_irq_save(rf.flags); double_rq_lock(env.dst_rq, busiest); + update_rq_clock(env.dst_rq); cur_ld_moved = ld_moved = move_task_groups(&env); double_rq_unlock(env.dst_rq, busiest); local_irq_restore(rf.flags); -- 2.26.2 ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL8 COMMIT] ms/mm: memcg: charge memcg percpu memory to the parent cgroup
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh8-4.18.0-193.6.3.vz8.4.9 --> commit 9dc4432ee111848013875a98d6e594c336655255 Author: Roman Gushchin Date: Tue Aug 11 18:30:25 2020 -0700 ms/mm: memcg: charge memcg percpu memory to the parent cgroup Memory cgroups are using large chunks of percpu memory to store vmstat data. Yet this memory is not accounted at all, so in the case when there are many (dying) cgroups, it's not exactly clear where all the memory is. Because the size of memory cgroup internal structures can dramatically exceed the size of object or page which is pinning it in the memory, it's not a good idea to simply ignore it. It actually breaks the isolation between cgroups. Let's account the consumed percpu memory to the parent cgroup. [g...@fb.com: add WARN_ON_ONCE()s, per Johannes] Link: http://lkml.kernel.org/r/20200811170611.gb1507...@carbon.dhcp.thefacebook.com Signed-off-by: Roman Gushchin Signed-off-by: Andrew Morton Reviewed-by: Shakeel Butt Acked-by: Dennis Zhou Acked-by: Johannes Weiner Cc: Christoph Lameter Cc: David Rientjes Cc: Joonsoo Kim Cc: Mel Gorman Cc: Michal Hocko Cc: Pekka Enberg Cc: Tejun Heo Cc: Tobin C. Harding Cc: Vlastimil Babka Cc: Waiman Long Cc: Bixuan Cui Cc: Michal Koutný Cc: Stephen Rothwell Link: http://lkml.kernel.org/r/20200623184515.4132564-5-g...@fb.com Signed-off-by: Linus Torvalds (cherry picked from commit 3e38e0aaca9eafb12b1c4b731d1c10975cbe7974) + fix commit: 9f457179244a ("mm: memcontrol: fix warning when allocating the root cgroup") Signed-off-by: Konstantin Khorenko Found and suggested by Vasily Averin Backport notices: * pn->lruvec_stat_local hunk dropped * memcg->vmstats_percpu is called memcg->stat_cpu in vz8 * memcg->vmstats_local hunk dropped --- mm/memcontrol.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 68242a72be4d..ff751ca90562 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4973,7 +4973,8 @@ static int alloc_mem_cgroup_per_node_info(struct mem_cgroup *memcg, int node) if (!pn) return 1; - pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat); + pn->lruvec_stat_cpu = alloc_percpu_gpf(struct lruvec_stat, + GFP_KERNEL_ACCOUNT); if (!pn->lruvec_stat_cpu) { kfree(pn); return 1; @@ -5034,7 +5035,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void) if (memcg->id.id < 0) goto fail; - memcg->stat_cpu = alloc_percpu(struct mem_cgroup_stat_cpu); + memcg->stat_cpu = alloc_percpu_gpf(struct mem_cgroup_stat_cpu, + GFP_KERNEL_ACCOUNT); if (!memcg->stat_cpu) goto fail; @@ -5075,7 +5077,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) struct mem_cgroup *memcg; long error = -ENOMEM; + memalloc_use_memcg(parent); memcg = mem_cgroup_alloc(); + memalloc_unuse_memcg(); if (!memcg) return ERR_PTR(error); ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL8 COMMIT] ms/memcg: account security cred as well to kmemcg
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh8-4.18.0-193.6.3.vz8.4.9 --> commit 60ab735e5b098212df0c2c944bac4fad0254b375 Author: Shakeel Butt Date: Sat Jan 4 12:59:43 2020 -0800 ms/memcg: account security cred as well to kmemcg The cred_jar kmem_cache is already memcg accounted in the current kernel but cred->security is not. Account cred->security to kmemcg. Recently we saw high root slab usage on our production and on further inspection, we found a buggy application leaking processes. Though that buggy application was contained within its memcg but we observe much more system memory overhead, couple of GiBs, during that period. This overhead can adversely impact the isolation on the system. One source of high overhead we found was cred->security objects, which have a lifetime of at least the life of the process which allocated them. Link: http://lkml.kernel.org/r/20191205223721.40034-1-shake...@google.com Signed-off-by: Shakeel Butt Acked-by: Chris Down Reviewed-by: Roman Gushchin Acked-by: Michal Hocko Cc: Johannes Weiner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds (cherry picked from commit 84029fd04c201a4c7e0b07ba262664900f47c6f5) Signed-off-by: Konstantin Khorenko Found and suggested by Vasily Averin --- kernel/cred.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/kernel/cred.c b/kernel/cred.c index 45d77284aed0..463a52f66c18 100644 --- a/kernel/cred.c +++ b/kernel/cred.c @@ -219,7 +219,7 @@ struct cred *cred_alloc_blank(void) new->magic = CRED_MAGIC; #endif - if (security_cred_alloc_blank(new, GFP_KERNEL) < 0) + if (security_cred_alloc_blank(new, GFP_KERNEL_ACCOUNT) < 0) goto error; return new; @@ -277,7 +277,7 @@ struct cred *prepare_creds(void) new->security = NULL; #endif - if (security_prepare_creds(new, old, GFP_KERNEL) < 0) + if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0) goto error; validate_creds(new); return new; @@ -684,7 +684,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon) #ifdef CONFIG_SECURITY new->security = NULL; #endif - if (security_prepare_creds(new, old, GFP_KERNEL) < 0) + if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0) goto error; put_cred(old); ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel
[Devel] [PATCH RHEL7 COMMIT] ovl: enable kmem accounting for overlayfs inodes
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will appear at https://src.openvz.org/scm/ovz/vzkernel.git after rh7-3.10.0-1127.18.2.vz7.163.30 --> commit d03763c0eaf282de8c4081f291658945791fa44e Author: Vasily Averin Date: Mon Sep 28 09:00:54 2020 +0300 ovl: enable kmem accounting for overlayfs inodes https://jira.sw.ru/browse/PSBM-108292 Signed-off-by: Vasily Averin --- fs/overlayfs/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c index fcb3f7a..d17276d 100644 --- a/fs/overlayfs/super.c +++ b/fs/overlayfs/super.c @@ -1607,7 +1607,7 @@ static int __init ovl_init(void) ovl_inode_cachep = kmem_cache_create("ovl_inode", sizeof(struct ovl_inode), 0, (SLAB_RECLAIM_ACCOUNT| - SLAB_MEM_SPREAD), + SLAB_MEM_SPREAD|SLAB_ACCOUNT), ovl_inode_init_once); if (ovl_inode_cachep == NULL) return -ENOMEM; ___ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel