[Devel] [PATCH RH7] ms/tracing: remove WARN_ON in start_thread()

2020-11-20 Thread Vasily Averin
This patch reverts upstream commit 978defee11a5 ("tracing: Do a WARN_ON()
 if start_thread() in hwlat is called when thread exists")

.start hook can be legally called several times if according
tracer is stopped

screen window 1
[root@localhost ~]# echo 1 > /sys/kernel/tracing/events/kmem/kfree/enable
[root@localhost ~]# echo 1 > /sys/kernel/tracing/options/pause-on-trace
[root@localhost ~]# less -F /sys/kernel/tracing/trace

screen window 2
[root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on
0
[root@localhost ~]# echo hwlat >  /sys/kernel/debug/tracing/current_tracer
[root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/tracing_on
[root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on
0
[root@localhost ~]# echo 2 > /sys/kernel/debug/tracing/tracing_on

triggers warning in dmesg:
WARNING: CPU: 3 PID: 1403 at kernel/trace/trace_hwlat.c:371 
hwlat_tracer_start+0xc9/0xd0

Fixes: 978defee11a5 ("tracing: Do a WARN_ON() if start_thread() in hwlat is 
called when thread exists")
Patch was sent to upstream but is not approved yet.
https://jira.sw.ru/browse/PSBM-122204
Signed-off-by: Vasily Averin 
---
 kernel/trace/trace_hwlat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index 45a9e3b..12b8bd7 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -368,7 +368,7 @@ static int start_kthread(struct trace_array *tr)
 {
struct task_struct *kthread;
 
-   if (WARN_ON(hwlat_kthread))
+   if (hwlat_kthread)
return 0;
 
kthread = kthread_create(kthread_fn, NULL, "hwlatd");
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 1/2] ms/extable: Enable RCU if it is not watching in kernel_text_address()

2020-11-20 Thread Vasily Averin
Author: Steven Rostedt (VMware) 
backported upstream commit e8cac8b1d10589be45671a5ade0926a639b543b7
extable: Enable RCU if it is not watching in kernel_text_address()

If kernel_text_address() is called when RCU is not watching, it can cause an
RCU bug because is_module_text_address(), the is_kprobe_*insn_slot()
and is_bpf_text_address() functions require the use of RCU.

Only enable RCU if it is not currently watching before it calls
is_module_text_address(). The use of rcu_nmi_enter() is used to enable RCU
because kernel_text_address() can happen pretty much anywhere (like an NMI),
and even from within an NMI. It is called via save_stack_trace() that can be
called by any WARN() or tracing function, which can happen while RCU is not
watching (for example, going to or coming from idle, or during CPU take down
or bring up).

Cc: sta...@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Acked-by: Paul E. McKenney 
Signed-off-by: Steven Rostedt (VMware) 

backport changes:
 context fixes, RH7/vz7 kernel does not have
  is_ftrace_trampoline() and is_kprobe_*() checks

https://jira.sw.ru/browse/PSBM-122315
Signed-off-by: Vasily Averin 
---
 kernel/extable.c | 31 ---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/kernel/extable.c b/kernel/extable.c
index 1cb213f..21136fc 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -116,13 +116,38 @@ int __kernel_text_address(unsigned long addr)
 
 int kernel_text_address(unsigned long addr)
 {
+   bool no_rcu;
+   int ret = 1;
+
if (core_kernel_text(addr))
return 1;
+
+   /*
+* If a stack dump happens while RCU is not watching, then
+* RCU needs to be notified that it requires to start
+* watching again. This can happen either by tracing that
+* triggers a stack trace, or a WARN() that happens during
+* coming back from idle, or cpu on or offlining.
+*
+* is_module_text_address() as well as the kprobe slots
+* and is_bpf_text_address() require RCU to be watching.
+*/
+   no_rcu = !rcu_is_watching();
+
+   /* Treat this like an NMI as it can happen anywhere */
+   if (no_rcu)
+   rcu_nmi_enter();
+
if (is_module_text_address(addr))
-   return 1;
+   goto out;
if (is_bpf_text_address(addr))
-   return 1;
-   return 0;
+   goto out;
+   ret = 0;
+out:
+   if (no_rcu)
+   rcu_nmi_exit();
+
+   return ret;
 }
 
 /*
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7] ms/tracing: Fix race in trace_open and buffer resize call

2020-11-20 Thread Vasily Averin
Author: Gaurav Kohli 
backported upstream commit bbeb97464eefc65f506084fd9f18f21653e01137

Below race can come, if trace_open and resize of
cpu buffer is running parallely on different cpus
CPUXCPUY
ring_buffer_resize
atomic_read(&buffer->resize_disabled)
tracing_open
tracing_reset_online_cpus
ring_buffer_reset_cpu
rb_reset_cpu
rb_update_pages
remove/insert pages
resetting pointer

This race can cause data abort or some times infinte loop in
rb_remove_pages and rb_insert_pages while checking pages
for sanity.

Take buffer lock to fix this.

Signed-off-by: Gaurav Kohli 
Signed-off-by: Steven Rostedt (VMware) 

backport changes: ring_buffer_reset_cpu() fixed only
ring_buffer_reset_online_cpus() is not present in this kernel
https://jira.sw.ru/browse/PSBM-122343
Signed-off-by: Vasily Averin 
---
 kernel/trace/ring_buffer.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 8b3df28..7b1afd1 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4170,6 +4170,9 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, 
int cpu)
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return;
 
+   /* prevent another thread from changing buffer sizes */
+   mutex_lock(&buffer->mutex);
+
atomic_inc(&buffer->resize_disabled);
atomic_inc(&cpu_buffer->record_disabled);
 
@@ -4192,6 +4195,8 @@ void ring_buffer_reset_cpu(struct ring_buffer *buffer, 
int cpu)
 
atomic_dec(&cpu_buffer->record_disabled);
atomic_dec(&buffer->resize_disabled);
+
+   mutex_unlock(&buffer->mutex);
 }
 EXPORT_SYMBOL_GPL(ring_buffer_reset_cpu);
 
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 2/2] ms/extable: Consolidate *kernel_text_address() functions

2020-11-20 Thread Vasily Averin
Author: Steven Rostedt (VMware) 
backported upstream commit 9aadde91b3c035413c806619beb3e3ef6e697953

The functionality between kernel_text_address() and _kernel_text_address()
is the same except that _kernel_text_address() does a little more (that
function needs a rename, but that can be done another time). Instead of
having duplicate code in both, simply have _kernel_text_address() calls
kernel_text_address() instead.

This is marked for stable because there's an RCU bug that can happen if
one of these functions gets called while RCU is not watching. That fix
depends on this fix to keep from having to write the fix twice.

Cc: sta...@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Acked-by: Paul E. McKenney 
Signed-off-by: Steven Rostedt (VMware) 

backport changes: context changes, Rh7 and vz7 kernels does not have
  is_ftrace_trampoline() and is_kprobe() checks

https://jira.sw.ru/browse/PSBM-122315
Signed-off-by: Vasily Averin 
---
 kernel/extable.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/kernel/extable.c b/kernel/extable.c
index 4f1a5d2..1cb213f 100644
--- a/kernel/extable.c
+++ b/kernel/extable.c
@@ -99,11 +99,7 @@ int core_kernel_data(unsigned long addr)
 
 int __kernel_text_address(unsigned long addr)
 {
-   if (core_kernel_text(addr))
-   return 1;
-   if (is_module_text_address(addr))
-   return 1;
-   if (is_bpf_text_address(addr))
+   if (kernel_text_address(addr))
return 1;
/*
 * There might be init symbols in saved stacktraces.
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH8] ms/tracing: remove WARN_ON in start_thread()

2020-11-20 Thread Vasily Averin
This patch reverts upstream commit 978defee11a5 ("tracing: Do a WARN_ON()
 if start_thread() in hwlat is called when thread exists")

.start hook can be legally called several times if according
tracer is stopped

screen window 1
[root@localhost ~]# echo 1 > /sys/kernel/tracing/events/kmem/kfree/enable
[root@localhost ~]# echo 1 > /sys/kernel/tracing/options/pause-on-trace
[root@localhost ~]# less -F /sys/kernel/tracing/trace

screen window 2
[root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on
0
[root@localhost ~]# echo hwlat >  /sys/kernel/debug/tracing/current_tracer
[root@localhost ~]# echo 1 > /sys/kernel/debug/tracing/tracing_on
[root@localhost ~]# cat /sys/kernel/debug/tracing/tracing_on
0
[root@localhost ~]# echo 2 > /sys/kernel/debug/tracing/tracing_on

triggers warning in dmesg:
WARNING: CPU: 3 PID: 1403 at kernel/trace/trace_hwlat.c:371 
hwlat_tracer_start+0xc9/0xd0

Fixes: 978defee11a5 ("tracing: Do a WARN_ON() if start_thread() in hwlat is 
called when thread exists")
Patch was sent to upstream but is not approved yet.
https://jira.sw.ru/browse/PSBM-120940
Signed-off-by: Vasily Averin 
---
 kernel/trace/trace_hwlat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index 45a9e3b..12b8bd7 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -368,7 +368,7 @@ static int start_kthread(struct trace_array *tr)
 {
struct task_struct *kthread;
 
-   if (WARN_ON(hwlat_kthread))
+   if (hwlat_kthread)
return 0;
 
kthread = kthread_create(kthread_fn, NULL, "hwlatd");
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 1/2] ipv6: silence high-order allocation warinig in rawv6_sendmsg()

2020-11-20 Thread Vasily Averin
neoLTP sctp_big_chunk testcase triggers high-order-allocation warning
 WARNING: CPU: 2 PID: 913629 at mm/page_alloc.c:3533 
__alloc_pages_nodemask+0x1b1/0x600
 order 5 >= 3, gfp 0x2044d0
 Kernel panic - not syncing: panic_on_warn set ...

 CPU: 2 PID: 913629 Comm: sctp_big_chunk ve: 0 Kdump: loaded Tainted: G OE 
 3.10.0-1127.18.2.vz7.163.39 #1 163.39
 Hardware name: Virtuozzo OpenStack Compute, BIOS 1.11.0-2.vz7.1 04/01/2014
 Call Trace:
 [] dump_stack+0x19/0x1b
 [] panic+0xe8/0x21f
 [] __warn+0xfa/0x100
 [] warn_slowpath_fmt+0x5f/0x80
 [] __alloc_pages_nodemask+0x1b1/0x600
 [] kmalloc_large_node+0x5f/0x80
 [] __kmalloc_node_track_caller+0x292/0x300
 [] __kmalloc_reserve.isra.32+0x44/0xa0
 [] __alloc_skb+0x8d/0x2d0
 [] alloc_skb_with_frags+0x57/0x1e0
 [] sock_alloc_send_skb+0x1b6/0x250
 [] rawv6_sendmsg+0x5cb/0xcd0
 [] inet_sendmsg+0x69/0xb0
 [] sock_sendmsg+0xb0/0xf0
 [] SYSC_sendto+0x121/0x1c0
 [] SyS_sendto+0xe/0x10
 [] system_call_fastpath+0x25/0x2a

https://jira.sw.ru/browse/PSBM-122200
Signed-off-by: Vasily Averin 
---
 net/ipv6/raw.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index f91636e..f5f72bb 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -623,9 +623,10 @@ static int rawv6_send_hdrinc(struct sock *sk, void *from, 
int length,
if (flags&MSG_PROBE)
goto out;
 
-   skb = sock_alloc_send_skb(sk,
+   skb = sock_alloc_send_skb_flags(sk,
  length + hlen + tlen + 15,
- flags & MSG_DONTWAIT, &err);
+ flags & MSG_DONTWAIT, &err,
+ __GFP_ORDER_NOWARN);
if (skb == NULL)
goto error;
skb_reserve(skb, hlen);
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RH7 2/2] net: silence high-order-allocation warning in sctp_pack_cookie()

2020-11-20 Thread Vasily Averin
neoLTP sctp_big_chunk testcase reproduces high-order-allocation warning:
 ---[ cut here ]---
 WARNING: CPU: 3 PID: 74823 at mm/page_alloc.c:3533 
__alloc_pages_nodemask+0x1b1/0x600
 order 5 >= 3, gfp 0xc020
 Modules linked in: sctp
 CPU: 3 PID: 74823 Comm: sctp_big_chunk ve: 0 Kdump: loaded Tainted: G W 
 3.10.0-1127.18.2.vz7.163
 Hardware name: Virtuozzo OpenStack Compute, BIOS 1.11.0-2.vz7.1 04/01/2014
 Call Trace:
  [] dump_stack+0x19/0x1b
 [] __warn+0xd8/0x100
 [] warn_slowpath_fmt+0x5f/0x80
 [] __alloc_pages_nodemask+0x1b1/0x600
 [] alloc_pages_current+0x98/0x110
 [] kmalloc_order+0x18/0x40
 [] kmalloc_order_trace+0x26/0xa0
 [] __kmalloc+0x281/0x2a0
 [] sctp_make_init_ack+0x104/0x6b0 [sctp]
 [] sctp_sf_do_5_1B_init+0x286/0x330 [sctp]
 [] sctp_do_sm+0xad/0x350 [sctp]
 [] sctp_endpoint_bh_rcv+0x122/0x240 [sctp]
 [] sctp_inq_push+0x51/0x70 [sctp]
 [] sctp_rcv+0xa8b/0xbd0 [sctp]
 [] sctp6_rcv+0xe/0x20 [sctp]
 [] ip6_input_finish+0xd7/0x450
 [] ip6_input+0x3a/0xb0
 [] ip6_rcv_finish+0x3e/0xc0
 [] ipv6_rcv+0x327/0x540
 [] __netif_receive_skb_core+0x729/0xa10
 [] __netif_receive_skb+0x18/0x60
 [] process_backlog+0xae/0x180
 [] net_rx_action+0x27f/0x3a0
 [] __do_softirq+0x125/0x2bb
 [] call_softirq+0x1c/0x30
  [] do_softirq+0x65/0xa0
 [] __local_bh_enable_ip+0x9b/0xb0
 [] local_bh_enable+0x17/0x20
 [] ip6_finish_output2+0x1a7/0x580
 [] ip6_finish_output+0x8c/0xf0
 [] ip6_output+0x57/0x110
 [] rawv6_sendmsg+0x6d7/0xcd0
 [] inet_sendmsg+0x69/0xb0
 [] sock_sendmsg+0xb0/0xf0
 [] SYSC_sendto+0x121/0x1c0
 [] SyS_sendto+0xe/0x10
 [] system_call_fastpath+0x25/0x2a
 --[ end trace 49ad7969a95b86af ]--

https://jira.sw.ru/browse/PSBM-122200
Signed-off-by: Vasily Averin 
---
 net/sctp/sm_make_chunk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 32d685a..ebd103f 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -1670,7 +1670,7 @@ static sctp_cookie_param_t *sctp_pack_cookie(const struct 
sctp_endpoint *ep,
/* Clear this memory since we are sending this data structure
 * out on the network.
 */
-   retval = kzalloc(*cookie_len, GFP_ATOMIC);
+   retval = kzalloc(*cookie_len, GFP_ATOMIC|__GFP_ORDER_NOWARN);
if (!retval)
goto nodata;
 
-- 
1.8.3.1

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] mm/memcg: cleanup vmpressure from mem_cgroup_css_free()

2020-11-20 Thread Andrey Ryabinin
Cleaning up vmpressure from mem_cgroup_css_offline() doesn't look
safe. It looks like mem_cgroup_css_offline() might race with reclaim
which will queue vmpressure work  after the flush.

Put vmpressure_cleanup() in mem_cgroup_css_free() where we have
exclusive access to memcg. It was originally there, see
https://jira.sw.ru/browse/PSBM-93884 but moved in a process of rebase.

https://jira.sw.ru/browse/PSBM-122655
Signed-off-by: Andrey Ryabinin 
---
 mm/memcontrol.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e36ad592b3c7..803273a4d9cb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6822,8 +6822,6 @@ static void mem_cgroup_css_offline(struct cgroup *cont)
mem_cgroup_free_all(memcg);
mem_cgroup_reparent_charges(memcg);
 
-   vmpressure_cleanup(&memcg->vmpressure);
-
/*
 * A cgroup can be destroyed while somebody is waiting for its
 * oom context, in which case the context will never be unlocked
@@ -6878,7 +6876,7 @@ static void mem_cgroup_css_free(struct cgroup *cont)
mem_cgroup_reparent_charges(memcg);
 
cancel_work_sync(&memcg->high_work);
-
+   vmpressure_cleanup(&memcg->vmpressure);
memcg_destroy_kmem(memcg);
memcg_free_shrinker_maps(memcg);
__mem_cgroup_free(memcg);
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH rh7] mm/memcg: cleanup vmpressure from mem_cgroup_css_free()

2020-11-20 Thread Andrey Ryabinin
Cleaning up vmpressure from mem_cgroup_css_offline() doesn't look
safe. It looks like mem_cgroup_css_offline() might race with reclaim
which will queue vmpressure work  after the flush.

Put vmpressure_cleanup() in mem_cgroup_css_free() where we have
exclusive access to memcg. It was originally there, see
https://jira.sw.ru/browse/PSBM-93884 but moved in a process of rebase.

https://jira.sw.ru/browse/PSBM-122653
Signed-off-by: Andrey Ryabinin 
---
 mm/memcontrol.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e36ad592b3c7..803273a4d9cb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6822,8 +6822,6 @@ static void mem_cgroup_css_offline(struct cgroup *cont)
mem_cgroup_free_all(memcg);
mem_cgroup_reparent_charges(memcg);
 
-   vmpressure_cleanup(&memcg->vmpressure);
-
/*
 * A cgroup can be destroyed while somebody is waiting for its
 * oom context, in which case the context will never be unlocked
@@ -6878,7 +6876,7 @@ static void mem_cgroup_css_free(struct cgroup *cont)
mem_cgroup_reparent_charges(memcg);
 
cancel_work_sync(&memcg->high_work);
-
+   vmpressure_cleanup(&memcg->vmpressure);
memcg_destroy_kmem(memcg);
memcg_free_shrinker_maps(memcg);
__mem_cgroup_free(memcg);
-- 
2.26.2

___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel


[Devel] [PATCH RHEL7 COMMIT] mm/memcg: cleanup vmpressure from mem_cgroup_css_free()

2020-11-20 Thread Vasily Averin
The commit is pushed to "branch-rh7-3.10.0-1127.18.2.vz7.163.x-ovz" and will 
appear at https://src.openvz.org/scm/ovz/vzkernel.git
after rh7-3.10.0-1127.18.2.vz7.163.45
-->
commit 842d2de61c9f6966ed94f5a6fec25bff6d6f2220
Author: Andrey Ryabinin 
Date:   Fri Nov 20 21:32:57 2020 +0300

mm/memcg: cleanup vmpressure from mem_cgroup_css_free()

Cleaning up vmpressure from mem_cgroup_css_offline() doesn't look
safe. It looks like mem_cgroup_css_offline() might race with reclaim
which will queue vmpressure work  after the flush.

Put vmpressure_cleanup() in mem_cgroup_css_free() where we have
exclusive access to memcg. It was originally there, see
https://jira.sw.ru/browse/PSBM-93884 but moved in a process of rebase.

https://jira.sw.ru/browse/PSBM-122653
Signed-off-by: Andrey Ryabinin 
---
 mm/memcontrol.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index e0e113b..e15935f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -6831,8 +6831,6 @@ static void mem_cgroup_css_offline(struct cgroup *cont)
mem_cgroup_free_all(memcg);
mem_cgroup_reparent_charges(memcg);
 
-   vmpressure_cleanup(&memcg->vmpressure);
-
/*
 * A cgroup can be destroyed while somebody is waiting for its
 * oom context, in which case the context will never be unlocked
@@ -6887,7 +6885,7 @@ static void mem_cgroup_css_free(struct cgroup *cont)
mem_cgroup_reparent_charges(memcg);
 
cancel_work_sync(&memcg->high_work);
-
+   vmpressure_cleanup(&memcg->vmpressure);
memcg_destroy_kmem(memcg);
memcg_free_shrinker_maps(memcg);
__mem_cgroup_free(memcg);
___
Devel mailing list
Devel@openvz.org
https://lists.openvz.org/mailman/listinfo/devel