Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation

2021-01-27 Thread Steven Rostedt
On Wed, 27 Jan 2021 18:08:34 +1100
Alexey Kardashevskiy  wrote:

> 
> I am running syzkaller and the kernel keeps crashing in 
> __traceiter_##_name. This patch makes these crashes happen lot less 

I have another solution to the above issue. But I'm now concerned with what
you write below.

> often (and so did the v1) but the kernel still crashes (examples below 
> but the common thing is that they crash in tracepoints). Disasm points 
> to __DO_TRACE_CALL(name) and this fixes it:
> 
> 
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -313,6 +313,7 @@ static inline struct tracepoint 
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>  \
>  it_func_ptr =   \
>  
> rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
> +   if (it_func_ptr)\

Looking at v2 of the patch, I found a bug that could make this happen.

I'm looking at doing something else that doesn't affect the fast path nor
does it bloat the kernel more than necessary.

I'll see if I can get that patch out today.

Thanks for the report.

-- Steve


Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation

2021-01-26 Thread Alexey Kardashevskiy




On 18/11/2020 23:46, Steven Rostedt wrote:

On Tue, 17 Nov 2020 20:54:24 -0800
Alexei Starovoitov  wrote:


  extern int
@@ -310,7 +312,12 @@ static inline struct tracepoint 
*tracepoint_ptr_deref(tracepoint_ptr_t *p)
 do {\
 it_func = (it_func_ptr)->func;  \
 __data = (it_func_ptr)->data;   \
-   ((void(*)(void *, proto))(it_func))(__data, args); \
+   /*  \
+* Removed functions that couldn't be allocated \
+* are replaced with TRACEPOINT_STUB.   \
+*/ \
+   if (likely(it_func != TRACEPOINT_STUB)) \
+   ((void(*)(void *, proto))(it_func))(__data, 
args); \


I think you're overreacting to the problem.


I will disagree with that.


Adding run-time check to extremely unlikely problem seems wasteful.


Show me a real benchmark that you can notice a problem here. I bet that
check is even within the noise of calling an indirect function. Especially
on a machine with retpolines.


99.9% of the time allocate_probes() will do kmalloc from slab of small
objects.
If that slab is out of memory it means it cannot allocate a single page.
In such case so many things will be failing to alloc that system
is unlikely operational. oom should have triggered long ago.
Imo Matt's approach to add __GFP_NOFAIL to allocate_probes()


Looking at the GFP_NOFAIL comment:

  * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
  * cannot handle allocation failures. The allocation could block
  * indefinitely but will never return with failure. Testing for
  * failure is pointless.
  * New users should be evaluated carefully (and the flag should be
  * used only when there is no reasonable failure policy) but it is
  * definitely preferable to use the flag rather than opencode endless
  * loop around allocator.

I realized I made a mistake in my patch for using it, as my patch is a
failure policy. It looks like something we want to avoid in general.

Thanks, I'll send a v3 that removes it.


when it's called from func_remove() is much better.
The error was reported by syzbot that was using
memory fault injections. ENOMEM in allocate_probes() was
never seen in real life and highly unlikely will ever be seen.


And the biggest thing you are missing here, is that if you are running on a
machine that has static calls, this code is never hit unless you have more
than one callback on a single tracepoint. That's because when there's only
one callback, it gets called directly, and this loop is not involved.



I am running syzkaller and the kernel keeps crashing in 
__traceiter_##_name. This patch makes these crashes happen lot less 
often (and so did the v1) but the kernel still crashes (examples below 
but the common thing is that they crash in tracepoints). Disasm points 
to __DO_TRACE_CALL(name) and this fixes it:



--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -313,6 +313,7 @@ static inline struct tracepoint 
*tracepoint_ptr_deref(tracepoint_ptr_t *p)

\
it_func_ptr =   \

rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
+   if (it_func_ptr)\
do {\
it_func = (it_func_ptr)->func;  \


I am running on a powerpc box which does not have CONFIG_HAVE_STATIC_CALL.

I wonder if it is still the same problem which mentioned v3 might fix or 
it is something different? Thanks,




[  285.072538] Kernel attempted to read user page (0) - exploit attempt? 
(uid: 0)

[  285.073657] BUG: Kernel NULL pointer dereference on read at 0x
[  285.075129] Faulting instruction address: 0xc02edf48
cpu 0xd: Vector: 300 (Data Access) at [c000115db530]
pc: c02edf48: lock_acquire+0x2e8/0x5d0
lr: c06ee450: step_into+0x940/0xc20
sp: c000115db7d0
   msr: 80009033
   dar: 0
 dsisr: 4000
  current = 0xc000115af280
  paca= 0xc0005ff9fe00   irqmask: 0x03   irq_happened: 0x01
pid   = 182, comm = systemd-journal
Linux version 5.11.0-rc5-le_syzkaller_a+fstn1 (aik@fstn1-p1) (gcc 
(Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU ld (GNU Binutils for Ubuntu) 
2.30) #65 SMP Wed Jan 27 16:46:46 AEDT 2021

enter ? for help
[c000115db8c0] c06ee450 step_into+0x940/0xc20
[c000115db950] c06efddc walk_component+0xbc/0x340
[c000115db9d0] c06f0418 link_path_walk.part.29+0x3b8/0x5b0

Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation

2020-11-18 Thread Steven Rostedt
On Tue, 17 Nov 2020 20:54:24 -0800
Alexei Starovoitov  wrote:

> >  extern int
> > @@ -310,7 +312,12 @@ static inline struct tracepoint 
> > *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> > do {\
> > it_func = (it_func_ptr)->func;  \
> > __data = (it_func_ptr)->data;   \
> > -   ((void(*)(void *, proto))(it_func))(__data, args); \
> > +   /*  \
> > +* Removed functions that couldn't be allocated \
> > +* are replaced with TRACEPOINT_STUB.   \
> > +*/ \
> > +   if (likely(it_func != TRACEPOINT_STUB)) \
> > +   ((void(*)(void *, proto))(it_func))(__data, 
> > args); \  
> 
> I think you're overreacting to the problem.

I will disagree with that.

> Adding run-time check to extremely unlikely problem seems wasteful.

Show me a real benchmark that you can notice a problem here. I bet that
check is even within the noise of calling an indirect function. Especially
on a machine with retpolines.

> 99.9% of the time allocate_probes() will do kmalloc from slab of small
> objects.
> If that slab is out of memory it means it cannot allocate a single page.
> In such case so many things will be failing to alloc that system
> is unlikely operational. oom should have triggered long ago.
> Imo Matt's approach to add __GFP_NOFAIL to allocate_probes()

Looking at the GFP_NOFAIL comment:

 * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller
 * cannot handle allocation failures. The allocation could block
 * indefinitely but will never return with failure. Testing for
 * failure is pointless.
 * New users should be evaluated carefully (and the flag should be
 * used only when there is no reasonable failure policy) but it is
 * definitely preferable to use the flag rather than opencode endless
 * loop around allocator.

I realized I made a mistake in my patch for using it, as my patch is a
failure policy. It looks like something we want to avoid in general.

Thanks, I'll send a v3 that removes it.

> when it's called from func_remove() is much better.
> The error was reported by syzbot that was using
> memory fault injections. ENOMEM in allocate_probes() was
> never seen in real life and highly unlikely will ever be seen.

And the biggest thing you are missing here, is that if you are running on a
machine that has static calls, this code is never hit unless you have more
than one callback on a single tracepoint. That's because when there's only
one callback, it gets called directly, and this loop is not involved.

-- Steve


Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation

2020-11-17 Thread Alexei Starovoitov
On Tue, Nov 17, 2020 at 6:18 PM Steven Rostedt  wrote:
>
> From: "Steven Rostedt (VMware)" 
>
> The list of tracepoint callbacks is managed by an array that is protected
> by RCU. To update this array, a new array is allocated, the updates are
> copied over to the new array, and then the list of functions for the
> tracepoint is switched over to the new array. After a completion of an RCU
> grace period, the old array is freed.
>
> This process happens for both adding a callback as well as removing one.
> But on removing a callback, if the new array fails to be allocated, the
> callback is not removed, and may be used after it is freed by the clients
> of the tracepoint.
>
> There's really no reason to fail if the allocation for a new array fails
> when removing a function. Instead, the function can simply be replaced by a
> stub that will be ignored in the callback loop, and it will be cleaned up
> on the next modification of the array.
>
> Link: https://lore.kernel.org/r/20201115055256.65625-1-mmull...@mmlx.us
> Link: https://lkml.kernel.org/r/20201116175107.02db3...@gandalf.local.home
>
> Cc: Mathieu Desnoyers 
> Cc: Ingo Molnar 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> Cc: Dmitry Vyukov 
> Cc: Martin KaFai Lau 
> Cc: Song Liu 
> Cc: Yonghong Song 
> Cc: Andrii Nakryiko 
> Cc: John Fastabend 
> Cc: KP Singh 
> Cc: netdev 
> Cc: bpf 
> Cc: Kees Cook 
> Cc: sta...@vger.kernel.org
> Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
> Reported-by: syzbot+83aa762ef23b6f0d1...@syzkaller.appspotmail.com
> Reported-by: syzbot+d29e58bb557324e55...@syzkaller.appspotmail.com
> Reported-by: Matt Mullins 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
> Changes since v1:
>Use 1L value for stub function, and ignore calling it.
>
>  include/linux/tracepoint.h |  9 -
>  kernel/tracepoint.c| 80 +-
>  2 files changed, 69 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 0f21617f1a66..2e06e05b9d2a 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -33,6 +33,8 @@ struct trace_eval_map {
>
>  #define TRACEPOINT_DEFAULT_PRIO10
>
> +#define TRACEPOINT_STUB((void *)0x1L)
> +
>  extern struct srcu_struct tracepoint_srcu;
>
>  extern int
> @@ -310,7 +312,12 @@ static inline struct tracepoint 
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
> do {\
> it_func = (it_func_ptr)->func;  \
> __data = (it_func_ptr)->data;   \
> -   ((void(*)(void *, proto))(it_func))(__data, args); \
> +   /*  \
> +* Removed functions that couldn't be allocated \
> +* are replaced with TRACEPOINT_STUB.   \
> +*/ \
> +   if (likely(it_func != TRACEPOINT_STUB)) \
> +   ((void(*)(void *, proto))(it_func))(__data, 
> args); \

I think you're overreacting to the problem.
Adding run-time check to extremely unlikely problem seems wasteful.
99.9% of the time allocate_probes() will do kmalloc from slab of small
objects.
If that slab is out of memory it means it cannot allocate a single page.
In such case so many things will be failing to alloc that system
is unlikely operational. oom should have triggered long ago.
Imo Matt's approach to add __GFP_NOFAIL to allocate_probes()
when it's called from func_remove() is much better.
The error was reported by syzbot that was using
memory fault injections. ENOMEM in allocate_probes() was
never seen in real life and highly unlikely will ever be seen.


[PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation

2020-11-17 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

The list of tracepoint callbacks is managed by an array that is protected
by RCU. To update this array, a new array is allocated, the updates are
copied over to the new array, and then the list of functions for the
tracepoint is switched over to the new array. After a completion of an RCU
grace period, the old array is freed.

This process happens for both adding a callback as well as removing one.
But on removing a callback, if the new array fails to be allocated, the
callback is not removed, and may be used after it is freed by the clients
of the tracepoint.

There's really no reason to fail if the allocation for a new array fails
when removing a function. Instead, the function can simply be replaced by a
stub that will be ignored in the callback loop, and it will be cleaned up
on the next modification of the array.

Link: https://lore.kernel.org/r/20201115055256.65625-1-mmull...@mmlx.us
Link: https://lkml.kernel.org/r/20201116175107.02db3...@gandalf.local.home

Cc: Mathieu Desnoyers 
Cc: Ingo Molnar 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Dmitry Vyukov 
Cc: Martin KaFai Lau 
Cc: Song Liu 
Cc: Yonghong Song 
Cc: Andrii Nakryiko 
Cc: John Fastabend 
Cc: KP Singh 
Cc: netdev 
Cc: bpf 
Cc: Kees Cook 
Cc: sta...@vger.kernel.org
Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints")
Reported-by: syzbot+83aa762ef23b6f0d1...@syzkaller.appspotmail.com
Reported-by: syzbot+d29e58bb557324e55...@syzkaller.appspotmail.com
Reported-by: Matt Mullins 
Signed-off-by: Steven Rostedt (VMware) 
---
Changes since v1:
   Use 1L value for stub function, and ignore calling it.

 include/linux/tracepoint.h |  9 -
 kernel/tracepoint.c| 80 +-
 2 files changed, 69 insertions(+), 20 deletions(-)

diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 0f21617f1a66..2e06e05b9d2a 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -33,6 +33,8 @@ struct trace_eval_map {
 
 #define TRACEPOINT_DEFAULT_PRIO10
 
+#define TRACEPOINT_STUB((void *)0x1L)
+
 extern struct srcu_struct tracepoint_srcu;
 
 extern int
@@ -310,7 +312,12 @@ static inline struct tracepoint 
*tracepoint_ptr_deref(tracepoint_ptr_t *p)
do {\
it_func = (it_func_ptr)->func;  \
__data = (it_func_ptr)->data;   \
-   ((void(*)(void *, proto))(it_func))(__data, args); \
+   /*  \
+* Removed functions that couldn't be allocated \
+* are replaced with TRACEPOINT_STUB.   \
+*/ \
+   if (likely(it_func != TRACEPOINT_STUB)) \
+   ((void(*)(void *, proto))(it_func))(__data, 
args); \
} while ((++it_func_ptr)->func);\
return 0;   \
}   \
diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 3f659f855074..20ce78b3f578 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -53,10 +53,10 @@ struct tp_probes {
struct tracepoint_func probes[];
 };
 
-static inline void *allocate_probes(int count)
+static inline void *allocate_probes(int count, gfp_t extra_flags)
 {
struct tp_probes *p  = kmalloc(struct_size(p, probes, count),
-  GFP_KERNEL);
+  GFP_KERNEL | extra_flags);
return p == NULL ? NULL : p->probes;
 }
 
@@ -131,6 +131,7 @@ func_add(struct tracepoint_func **funcs, struct 
tracepoint_func *tp_func,
 {
struct tracepoint_func *old, *new;
int nr_probes = 0;
+   int stub_funcs = 0;
int pos = -1;
 
if (WARN_ON(!tp_func->func))
@@ -147,14 +148,34 @@ func_add(struct tracepoint_func **funcs, struct 
tracepoint_func *tp_func,
if (old[nr_probes].func == tp_func->func &&
old[nr_probes].data == tp_func->data)
return ERR_PTR(-EEXIST);
+   if (old[nr_probes].func == TRACEPOINT_STUB)
+   stub_funcs++;
}
}
-   /* + 2 : one for new probe, one for NULL func */
-   new = allocate_probes(nr_probes + 2);
+   /* + 2 : one for new probe, one for NULL func - stub functions */
+   new = allocate_probes(nr_probes + 2 - stub_funcs, 0);
if (new == NULL)
return ERR_PTR(-ENOMEM);
if (old) {
-   if (pos < 0) {
+   if (stub_funcs) {
+   /* Need to copy one at a time to remove stubs */
+