[Devel] [PATCH RHEL8 COMMIT] kernel/sched/fair.c: Add missing update_rq_clock() calls

2020-09-28 Thread Konstantin Khorenko
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear 
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-193.6.3.vz8.4.9
-->
commit 15645110b30affdf50f83b458724c8b503224aa4
Author: Andrey Ryabinin 
Date:   Mon Sep 28 18:46:06 2020 +0300

kernel/sched/fair.c: Add missing update_rq_clock() calls

We've got a hard lockup which seems to be caused by mgag200
console printk code calling to schedule_work from scheduler
with rq->lock held:

 #5 [b79e034239a8] native_queued_spin_lock_slowpath at 8b50c6c6
 #6 [b79e034239a8] _raw_spin_lock at 8bc96e5c
 #7 [b79e034239b0] try_to_wake_up at 8b4e26ff
 #8 [b79e03423a10] __queue_work at 8b4ce3f3
 #9 [b79e03423a58] queue_work_on at 8b4ce714

The printk called because assert_clock_updated() triggered
SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP);

This means that we missing necessary update_rq_clock() call.
Add one to cpulimit_balance_cpu_stop() to fix the warning.
Also add one in load_balance() before move_task_groups() call.
It seems to be another place missing this call.

https://jira.sw.ru/browse/PSBM-108013
Signed-off-by: Andrey Ryabinin 
---
 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5d3556b15e70..e6dc21d5fa03 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data)
 
schedstat_inc(sd->clb_count);
 
+   update_rq_clock(rq);
if (do_cpulimit_balance(&env))
schedstat_inc(sd->clb_pushed);
else
@@ -9176,6 +9177,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
env.loop = 0;
local_irq_save(rf.flags);
double_rq_lock(env.dst_rq, busiest);
+   update_rq_clock(env.dst_rq);
cur_ld_moved = ld_moved = move_task_groups(&env);
double_rq_unlock(env.dst_rq, busiest);
local_irq_restore(rf.flags);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL8 COMMIT] ms/mm: mempolicy: require at least one nodeid for MPOL_PREFERRED

2020-09-28 Thread Konstantin Khorenko
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear 
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-193.6.3.vz8.4.9
-->
commit 56a5186d29d7373d71fd1519d90b785585840a2f
Author: Randy Dunlap 
Date:   Wed Apr 1 21:10:58 2020 -0700

ms/mm: mempolicy: require at least one nodeid for MPOL_PREFERRED

Using an empty (malformed) nodelist that is not caught during mount option
parsing leads to a stack-out-of-bounds access.

The option string that was used was: "mpol=prefer:,".  However,
MPOL_PREFERRED requires a single node number, which is not being provided
here.

Add a check that 'nodes' is not empty after parsing for MPOL_PREFERRED's
nodeid.

Fixes: 095f1fc4ebf3 ("mempolicy: rework shmem mpol parsing and display")
Reported-by: Entropy Moe <3ntr0py1...@gmail.com>
Reported-by: syzbot+b055b1a6b2b958707...@syzkaller.appspotmail.com
Signed-off-by: Randy Dunlap 
Signed-off-by: Andrew Morton 
Tested-by: syzbot+b055b1a6b2b958707...@syzkaller.appspotmail.com
Cc: Lee Schermerhorn 
Link: 
http://lkml.kernel.org/r/89526377-7eb6-b662-e1d8-4430928ab...@infradead.org
Signed-off-by: Linus Torvalds 

https://jira.sw.ru/browse/PSBM-120642
CVE-2020-11565: out-of-bounds write in mpol_parse_str function in 
mm/mempolicy.c

(cherry picked from commit aa9f7d5172fac9bf1f09e678c35e287a40a7b7dd)
Signed-off-by: Konstantin Khorenko 
---
 mm/mempolicy.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index dab3b16534f0..b24738101414 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2797,7 +2797,9 @@ int mpol_parse_str(char *str, struct mempolicy **mpol)
switch (mode) {
case MPOL_PREFERRED:
/*
-* Insist on a nodelist of one node only
+* Insist on a nodelist of one node only, although later
+* we use first_node(nodes) to grab a single node, so here
+* nodelist (or nodes) cannot be empty.
 */
if (nodelist) {
char *rest = nodelist;
@@ -2805,6 +2807,8 @@ int mpol_parse_str(char *str, struct mempolicy **mpol)
rest++;
if (*rest)
goto out;
+   if (nodes_empty(nodes))
+   goto out;
}
break;
case MPOL_INTERLEAVE:
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH v2 vz8] kernel/sched/fair.c: Add missing update_rq_clock() calls

2020-09-28 Thread Andrey Ryabinin
We've got a hard lockup which seems to be caused by mgag200
console printk code calling to schedule_work from scheduler
with rq->lock held:
  #5 [b79e034239a8] native_queued_spin_lock_slowpath at 8b50c6c6
  #6 [b79e034239a8] _raw_spin_lock at 8bc96e5c
  #7 [b79e034239b0] try_to_wake_up at 8b4e26ff
  #8 [b79e03423a10] __queue_work at 8b4ce3f3
  #9 [b79e03423a58] queue_work_on at 8b4ce714
 #10 [b79e03423a68] mga_imageblit at c026d666 [mgag200]
 #11 [b79e03423a80] soft_cursor at 8b8a9d84
 #12 [b79e03423ad8] bit_cursor at 8b8a99b2
 #13 [b79e03423ba0] hide_cursor at 8b93bc7a
 #14 [b79e03423bb0] vt_console_print at 8b93e07d
 #15 [b79e03423c18] console_unlock at 8b518f0e
 #16 [b79e03423c68] vprintk_emit_log at 8b51acf7
 #17 [b79e03423cc0] vprintk_default at 8b51adcd
 #18 [b79e03423cd0] printk at 8b51b3d6
 #19 [b79e03423d30] __warn_printk at 8b4b13a0
 #20 [b79e03423d98] assert_clock_updated at 8b4dd293
 #21 [b79e03423da0] deactivate_task at 8b4e12d1
 #22 [b79e03423dc8] move_task_group at 8b4eaa5b
 #23 [b79e03423e00] cpulimit_balance_cpu_stop at 8b4f02f3
 #24 [b79e03423eb0] cpu_stopper_thread at 8b576b67
 #25 [b79e03423ee8] smpboot_thread_fn at 8b4d9125
 #26 [b79e03423f10] kthread at 8b4d4fc2
 #27 [b79e03423f50] ret_from_fork at 8be00255

The printk called because assert_clock_updated() triggered
SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP);

This means that we missing necessary update_rq_clock() call.
Add one to cpulimit_balance_cpu_stop() to fix the warning.
Also add one in load_balance() before move_task_groups() call.
It seems to be another place missing this call.

https://jira.sw.ru/browse/PSBM-108013
Signed-off-by: Andrey Ryabinin 
---
 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5d3556b15e70..e6dc21d5fa03 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data)
 
schedstat_inc(sd->clb_count);
 
+   update_rq_clock(rq);
if (do_cpulimit_balance(&env))
schedstat_inc(sd->clb_pushed);
else
@@ -9176,6 +9177,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
env.loop = 0;
local_irq_save(rf.flags);
double_rq_lock(env.dst_rq, busiest);
+   update_rq_clock(env.dst_rq);
cur_ld_moved = ld_moved = move_task_groups(&env);
double_rq_unlock(env.dst_rq, busiest);
local_irq_restore(rf.flags);
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH vz8] kernel/sched/fair.c: Add missing update_rq_clock() calls

2020-09-28 Thread Andrey Ryabinin
We've got a hard lockup which seems to be caused by mgag200
console printk code calling to schedule_work from scheduler
with rq->lock held:

 #5 [b79e034239a8] native_queued_spin_lock_slowpath at 8b50c6c6
 #6 [b79e034239a8] _raw_spin_lock at 8bc96e5c
 #7 [b79e034239b0] try_to_wake_up at 8b4e26ff
 #8 [b79e03423a10] __queue_work at 8b4ce3f3
 #9 [b79e03423a58] queue_work_on at 8b4ce714

The printk called because assert_clock_updated() triggered
SCHED_WARN_ON(rq->clock_update_flags < RQCF_ACT_SKIP);

This means that we missing necessary update_rq_clock() call.
Add one to cpulimit_balance_cpu_stop() to fix the warning.
Also add one in load_balance() before move_task_groups() call.
It seems to be another place missing this call.

https://jira.sw.ru/browse/PSBM-108013
Signed-off-by: Andrey Ryabinin 
---
 kernel/sched/fair.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 5d3556b15e70..e6dc21d5fa03 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7816,6 +7816,7 @@ static int cpulimit_balance_cpu_stop(void *data)
 
schedstat_inc(sd->clb_count);
 
+   update_rq_clock(rq);
if (do_cpulimit_balance(&env))
schedstat_inc(sd->clb_pushed);
else
@@ -9176,6 +9177,7 @@ static int load_balance(int this_cpu, struct rq *this_rq,
env.loop = 0;
local_irq_save(rf.flags);
double_rq_lock(env.dst_rq, busiest);
+   update_rq_clock(env.dst_rq);
cur_ld_moved = ld_moved = move_task_groups(&env);
double_rq_unlock(env.dst_rq, busiest);
local_irq_restore(rf.flags);
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL8 COMMIT] ms/mm: memcg: charge memcg percpu memory to the parent cgroup

2020-09-28 Thread Konstantin Khorenko
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear 
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-193.6.3.vz8.4.9
-->
commit 9dc4432ee111848013875a98d6e594c336655255
Author: Roman Gushchin 
Date:   Tue Aug 11 18:30:25 2020 -0700

ms/mm: memcg: charge memcg percpu memory to the parent cgroup

Memory cgroups are using large chunks of percpu memory to store vmstat
data.  Yet this memory is not accounted at all, so in the case when there
are many (dying) cgroups, it's not exactly clear where all the memory is.

Because the size of memory cgroup internal structures can dramatically
exceed the size of object or page which is pinning it in the memory, it's
not a good idea to simply ignore it.  It actually breaks the isolation
between cgroups.

Let's account the consumed percpu memory to the parent cgroup.

[g...@fb.com: add WARN_ON_ONCE()s, per Johannes]
  Link: 
http://lkml.kernel.org/r/20200811170611.gb1507...@carbon.dhcp.thefacebook.com

Signed-off-by: Roman Gushchin 
Signed-off-by: Andrew Morton 
Reviewed-by: Shakeel Butt 
Acked-by: Dennis Zhou 
Acked-by: Johannes Weiner 
Cc: Christoph Lameter 
Cc: David Rientjes 
Cc: Joonsoo Kim 
Cc: Mel Gorman 
Cc: Michal Hocko 
Cc: Pekka Enberg 
Cc: Tejun Heo 
Cc: Tobin C. Harding 
Cc: Vlastimil Babka 
Cc: Waiman Long 
Cc: Bixuan Cui 
Cc: Michal Koutný 
Cc: Stephen Rothwell 
Link: http://lkml.kernel.org/r/20200623184515.4132564-5-g...@fb.com
Signed-off-by: Linus Torvalds 

(cherry picked from commit 3e38e0aaca9eafb12b1c4b731d1c10975cbe7974)
+ fix commit: 9f457179244a ("mm: memcontrol: fix warning when allocating
the root cgroup")

Signed-off-by: Konstantin Khorenko 
Found and suggested by Vasily Averin 

Backport notices:
* pn->lruvec_stat_local hunk dropped
* memcg->vmstats_percpu is called memcg->stat_cpu in vz8
* memcg->vmstats_local hunk dropped
---
 mm/memcontrol.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 68242a72be4d..ff751ca90562 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -4973,7 +4973,8 @@ static int alloc_mem_cgroup_per_node_info(struct 
mem_cgroup *memcg, int node)
if (!pn)
return 1;
 
-   pn->lruvec_stat_cpu = alloc_percpu(struct lruvec_stat);
+   pn->lruvec_stat_cpu = alloc_percpu_gpf(struct lruvec_stat,
+  GFP_KERNEL_ACCOUNT);
if (!pn->lruvec_stat_cpu) {
kfree(pn);
return 1;
@@ -5034,7 +5035,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
if (memcg->id.id < 0)
goto fail;
 
-   memcg->stat_cpu = alloc_percpu(struct mem_cgroup_stat_cpu);
+   memcg->stat_cpu = alloc_percpu_gpf(struct mem_cgroup_stat_cpu,
+  GFP_KERNEL_ACCOUNT);
if (!memcg->stat_cpu)
goto fail;
 
@@ -5075,7 +5077,9 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state 
*parent_css)
struct mem_cgroup *memcg;
long error = -ENOMEM;
 
+   memalloc_use_memcg(parent);
memcg = mem_cgroup_alloc();
+   memalloc_unuse_memcg();
if (!memcg)
return ERR_PTR(error);
 
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL8 COMMIT] ms/memcg: account security cred as well to kmemcg

2020-09-28 Thread Konstantin Khorenko
The commit is pushed to "branch-rh8-4.18.0-193.6.3.vz8.4.x-ovz" and will appear 
at https://src.openvz.org/scm/ovz/vzkernel.git
after rh8-4.18.0-193.6.3.vz8.4.9
-->
commit 60ab735e5b098212df0c2c944bac4fad0254b375
Author: Shakeel Butt 
Date:   Sat Jan 4 12:59:43 2020 -0800

ms/memcg: account security cred as well to kmemcg

The cred_jar kmem_cache is already memcg accounted in the current kernel
but cred->security is not.  Account cred->security to kmemcg.

Recently we saw high root slab usage on our production and on further
inspection, we found a buggy application leaking processes.  Though that
buggy application was contained within its memcg but we observe much
more system memory overhead, couple of GiBs, during that period.  This
overhead can adversely impact the isolation on the system.

One source of high overhead we found was cred->security objects, which
have a lifetime of at least the life of the process which allocated
them.

Link: http://lkml.kernel.org/r/20191205223721.40034-1-shake...@google.com
Signed-off-by: Shakeel Butt 
Acked-by: Chris Down 
Reviewed-by: Roman Gushchin 
Acked-by: Michal Hocko 
Cc: Johannes Weiner 
Cc: 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds 

(cherry picked from commit 84029fd04c201a4c7e0b07ba262664900f47c6f5)
Signed-off-by: Konstantin Khorenko 
Found and suggested by Vasily Averin 
---
 kernel/cred.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/cred.c b/kernel/cred.c
index 45d77284aed0..463a52f66c18 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -219,7 +219,7 @@ struct cred *cred_alloc_blank(void)
new->magic = CRED_MAGIC;
 #endif
 
-   if (security_cred_alloc_blank(new, GFP_KERNEL) < 0)
+   if (security_cred_alloc_blank(new, GFP_KERNEL_ACCOUNT) < 0)
goto error;
 
return new;
@@ -277,7 +277,7 @@ struct cred *prepare_creds(void)
new->security = NULL;
 #endif
 
-   if (security_prepare_creds(new, old, GFP_KERNEL) < 0)
+   if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0)
goto error;
validate_creds(new);
return new;
@@ -684,7 +684,7 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
 #ifdef CONFIG_SECURITY
new->security = NULL;
 #endif
-   if (security_prepare_creds(new, old, GFP_KERNEL) < 0)
+   if (security_prepare_creds(new, old, GFP_KERNEL_ACCOUNT) < 0)
goto error;
 
put_cred(old);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] ovl: enable kmem accounting for overlayfs inodes

2020-09-28 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.30
-->
commit d03763c0eaf282de8c4081f291658945791fa44e
Author: Vasily Averin 
Date:   Mon Sep 28 09:00:54 2020 +0300

ovl: enable kmem accounting for overlayfs inodes

https://jira.sw.ru/browse/PSBM-108292
Signed-off-by: Vasily Averin 
---
 fs/overlayfs/super.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index fcb3f7a..d17276d 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -1607,7 +1607,7 @@ static int __init ovl_init(void)
ovl_inode_cachep = kmem_cache_create("ovl_inode",
 sizeof(struct ovl_inode), 0,
 (SLAB_RECLAIM_ACCOUNT|
- SLAB_MEM_SPREAD),
+ SLAB_MEM_SPREAD|SLAB_ACCOUNT),
 ovl_inode_init_once);
if (ovl_inode_cachep == NULL)
return -ENOMEM;
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel