Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation
On Wed, 27 Jan 2021 18:08:34 +1100 Alexey Kardashevskiy wrote: > > I am running syzkaller and the kernel keeps crashing in > __traceiter_##_name. This patch makes these crashes happen lot less I have another solution to the above issue. But I'm now concerned with what you write below. > often (and so did the v1) but the kernel still crashes (examples below > but the common thing is that they crash in tracepoints). Disasm points > to __DO_TRACE_CALL(name) and this fixes it: > > > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -313,6 +313,7 @@ static inline struct tracepoint > *tracepoint_ptr_deref(tracepoint_ptr_t *p) > \ > it_func_ptr = \ > > rcu_dereference_raw((&__tracepoint_##_name)->funcs); \ > + if (it_func_ptr)\ Looking at v2 of the patch, I found a bug that could make this happen. I'm looking at doing something else that doesn't affect the fast path nor does it bloat the kernel more than necessary. I'll see if I can get that patch out today. Thanks for the report. -- Steve
Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation
On 18/11/2020 23:46, Steven Rostedt wrote: On Tue, 17 Nov 2020 20:54:24 -0800 Alexei Starovoitov wrote: extern int @@ -310,7 +312,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) do {\ it_func = (it_func_ptr)->func; \ __data = (it_func_ptr)->data; \ - ((void(*)(void *, proto))(it_func))(__data, args); \ + /* \ +* Removed functions that couldn't be allocated \ +* are replaced with TRACEPOINT_STUB. \ +*/ \ + if (likely(it_func != TRACEPOINT_STUB)) \ + ((void(*)(void *, proto))(it_func))(__data, args); \ I think you're overreacting to the problem. I will disagree with that. Adding run-time check to extremely unlikely problem seems wasteful. Show me a real benchmark that you can notice a problem here. I bet that check is even within the noise of calling an indirect function. Especially on a machine with retpolines. 99.9% of the time allocate_probes() will do kmalloc from slab of small objects. If that slab is out of memory it means it cannot allocate a single page. In such case so many things will be failing to alloc that system is unlikely operational. oom should have triggered long ago. Imo Matt's approach to add __GFP_NOFAIL to allocate_probes() Looking at the GFP_NOFAIL comment: * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller * cannot handle allocation failures. The allocation could block * indefinitely but will never return with failure. Testing for * failure is pointless. * New users should be evaluated carefully (and the flag should be * used only when there is no reasonable failure policy) but it is * definitely preferable to use the flag rather than opencode endless * loop around allocator. I realized I made a mistake in my patch for using it, as my patch is a failure policy. It looks like something we want to avoid in general. Thanks, I'll send a v3 that removes it. when it's called from func_remove() is much better. The error was reported by syzbot that was using memory fault injections. ENOMEM in allocate_probes() was never seen in real life and highly unlikely will ever be seen. And the biggest thing you are missing here, is that if you are running on a machine that has static calls, this code is never hit unless you have more than one callback on a single tracepoint. That's because when there's only one callback, it gets called directly, and this loop is not involved. I am running syzkaller and the kernel keeps crashing in __traceiter_##_name. This patch makes these crashes happen lot less often (and so did the v1) but the kernel still crashes (examples below but the common thing is that they crash in tracepoints). Disasm points to __DO_TRACE_CALL(name) and this fixes it: --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -313,6 +313,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) \ it_func_ptr = \ rcu_dereference_raw((&__tracepoint_##_name)->funcs); \ + if (it_func_ptr)\ do {\ it_func = (it_func_ptr)->func; \ I am running on a powerpc box which does not have CONFIG_HAVE_STATIC_CALL. I wonder if it is still the same problem which mentioned v3 might fix or it is something different? Thanks, [ 285.072538] Kernel attempted to read user page (0) - exploit attempt? (uid: 0) [ 285.073657] BUG: Kernel NULL pointer dereference on read at 0x [ 285.075129] Faulting instruction address: 0xc02edf48 cpu 0xd: Vector: 300 (Data Access) at [c000115db530] pc: c02edf48: lock_acquire+0x2e8/0x5d0 lr: c06ee450: step_into+0x940/0xc20 sp: c000115db7d0 msr: 80009033 dar: 0 dsisr: 4000 current = 0xc000115af280 paca= 0xc0005ff9fe00 irqmask: 0x03 irq_happened: 0x01 pid = 182, comm = systemd-journal Linux version 5.11.0-rc5-le_syzkaller_a+fstn1 (aik@fstn1-p1) (gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0, GNU ld (GNU Binutils for Ubuntu) 2.30) #65 SMP Wed Jan 27 16:46:46 AEDT 2021 enter ? for help [c000115db8c0] c06ee450 step_into+0x940/0xc20 [c000115db950] c06efddc walk_component+0xbc/0x340 [c000115db9d0] c06f0418 link_path_walk.part.29+0x3b8/0x5b0
Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation
On Tue, 17 Nov 2020 20:54:24 -0800 Alexei Starovoitov wrote: > > extern int > > @@ -310,7 +312,12 @@ static inline struct tracepoint > > *tracepoint_ptr_deref(tracepoint_ptr_t *p) > > do {\ > > it_func = (it_func_ptr)->func; \ > > __data = (it_func_ptr)->data; \ > > - ((void(*)(void *, proto))(it_func))(__data, args); \ > > + /* \ > > +* Removed functions that couldn't be allocated \ > > +* are replaced with TRACEPOINT_STUB. \ > > +*/ \ > > + if (likely(it_func != TRACEPOINT_STUB)) \ > > + ((void(*)(void *, proto))(it_func))(__data, > > args); \ > > I think you're overreacting to the problem. I will disagree with that. > Adding run-time check to extremely unlikely problem seems wasteful. Show me a real benchmark that you can notice a problem here. I bet that check is even within the noise of calling an indirect function. Especially on a machine with retpolines. > 99.9% of the time allocate_probes() will do kmalloc from slab of small > objects. > If that slab is out of memory it means it cannot allocate a single page. > In such case so many things will be failing to alloc that system > is unlikely operational. oom should have triggered long ago. > Imo Matt's approach to add __GFP_NOFAIL to allocate_probes() Looking at the GFP_NOFAIL comment: * %__GFP_NOFAIL: The VM implementation _must_ retry infinitely: the caller * cannot handle allocation failures. The allocation could block * indefinitely but will never return with failure. Testing for * failure is pointless. * New users should be evaluated carefully (and the flag should be * used only when there is no reasonable failure policy) but it is * definitely preferable to use the flag rather than opencode endless * loop around allocator. I realized I made a mistake in my patch for using it, as my patch is a failure policy. It looks like something we want to avoid in general. Thanks, I'll send a v3 that removes it. > when it's called from func_remove() is much better. > The error was reported by syzbot that was using > memory fault injections. ENOMEM in allocate_probes() was > never seen in real life and highly unlikely will ever be seen. And the biggest thing you are missing here, is that if you are running on a machine that has static calls, this code is never hit unless you have more than one callback on a single tracepoint. That's because when there's only one callback, it gets called directly, and this loop is not involved. -- Steve
Re: [PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation
On Tue, Nov 17, 2020 at 6:18 PM Steven Rostedt wrote: > > From: "Steven Rostedt (VMware)" > > The list of tracepoint callbacks is managed by an array that is protected > by RCU. To update this array, a new array is allocated, the updates are > copied over to the new array, and then the list of functions for the > tracepoint is switched over to the new array. After a completion of an RCU > grace period, the old array is freed. > > This process happens for both adding a callback as well as removing one. > But on removing a callback, if the new array fails to be allocated, the > callback is not removed, and may be used after it is freed by the clients > of the tracepoint. > > There's really no reason to fail if the allocation for a new array fails > when removing a function. Instead, the function can simply be replaced by a > stub that will be ignored in the callback loop, and it will be cleaned up > on the next modification of the array. > > Link: https://lore.kernel.org/r/20201115055256.65625-1-mmull...@mmlx.us > Link: https://lkml.kernel.org/r/20201116175107.02db3...@gandalf.local.home > > Cc: Mathieu Desnoyers > Cc: Ingo Molnar > Cc: Alexei Starovoitov > Cc: Daniel Borkmann > Cc: Dmitry Vyukov > Cc: Martin KaFai Lau > Cc: Song Liu > Cc: Yonghong Song > Cc: Andrii Nakryiko > Cc: John Fastabend > Cc: KP Singh > Cc: netdev > Cc: bpf > Cc: Kees Cook > Cc: sta...@vger.kernel.org > Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints") > Reported-by: syzbot+83aa762ef23b6f0d1...@syzkaller.appspotmail.com > Reported-by: syzbot+d29e58bb557324e55...@syzkaller.appspotmail.com > Reported-by: Matt Mullins > Signed-off-by: Steven Rostedt (VMware) > --- > Changes since v1: >Use 1L value for stub function, and ignore calling it. > > include/linux/tracepoint.h | 9 - > kernel/tracepoint.c| 80 +- > 2 files changed, 69 insertions(+), 20 deletions(-) > > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 0f21617f1a66..2e06e05b9d2a 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -33,6 +33,8 @@ struct trace_eval_map { > > #define TRACEPOINT_DEFAULT_PRIO10 > > +#define TRACEPOINT_STUB((void *)0x1L) > + > extern struct srcu_struct tracepoint_srcu; > > extern int > @@ -310,7 +312,12 @@ static inline struct tracepoint > *tracepoint_ptr_deref(tracepoint_ptr_t *p) > do {\ > it_func = (it_func_ptr)->func; \ > __data = (it_func_ptr)->data; \ > - ((void(*)(void *, proto))(it_func))(__data, args); \ > + /* \ > +* Removed functions that couldn't be allocated \ > +* are replaced with TRACEPOINT_STUB. \ > +*/ \ > + if (likely(it_func != TRACEPOINT_STUB)) \ > + ((void(*)(void *, proto))(it_func))(__data, > args); \ I think you're overreacting to the problem. Adding run-time check to extremely unlikely problem seems wasteful. 99.9% of the time allocate_probes() will do kmalloc from slab of small objects. If that slab is out of memory it means it cannot allocate a single page. In such case so many things will be failing to alloc that system is unlikely operational. oom should have triggered long ago. Imo Matt's approach to add __GFP_NOFAIL to allocate_probes() when it's called from func_remove() is much better. The error was reported by syzbot that was using memory fault injections. ENOMEM in allocate_probes() was never seen in real life and highly unlikely will ever be seen.
[PATCH v2] tracepoint: Do not fail unregistering a probe due to memory allocation
From: "Steven Rostedt (VMware)" The list of tracepoint callbacks is managed by an array that is protected by RCU. To update this array, a new array is allocated, the updates are copied over to the new array, and then the list of functions for the tracepoint is switched over to the new array. After a completion of an RCU grace period, the old array is freed. This process happens for both adding a callback as well as removing one. But on removing a callback, if the new array fails to be allocated, the callback is not removed, and may be used after it is freed by the clients of the tracepoint. There's really no reason to fail if the allocation for a new array fails when removing a function. Instead, the function can simply be replaced by a stub that will be ignored in the callback loop, and it will be cleaned up on the next modification of the array. Link: https://lore.kernel.org/r/20201115055256.65625-1-mmull...@mmlx.us Link: https://lkml.kernel.org/r/20201116175107.02db3...@gandalf.local.home Cc: Mathieu Desnoyers Cc: Ingo Molnar Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Dmitry Vyukov Cc: Martin KaFai Lau Cc: Song Liu Cc: Yonghong Song Cc: Andrii Nakryiko Cc: John Fastabend Cc: KP Singh Cc: netdev Cc: bpf Cc: Kees Cook Cc: sta...@vger.kernel.org Fixes: 97e1c18e8d17b ("tracing: Kernel Tracepoints") Reported-by: syzbot+83aa762ef23b6f0d1...@syzkaller.appspotmail.com Reported-by: syzbot+d29e58bb557324e55...@syzkaller.appspotmail.com Reported-by: Matt Mullins Signed-off-by: Steven Rostedt (VMware) --- Changes since v1: Use 1L value for stub function, and ignore calling it. include/linux/tracepoint.h | 9 - kernel/tracepoint.c| 80 +- 2 files changed, 69 insertions(+), 20 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index 0f21617f1a66..2e06e05b9d2a 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -33,6 +33,8 @@ struct trace_eval_map { #define TRACEPOINT_DEFAULT_PRIO10 +#define TRACEPOINT_STUB((void *)0x1L) + extern struct srcu_struct tracepoint_srcu; extern int @@ -310,7 +312,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p) do {\ it_func = (it_func_ptr)->func; \ __data = (it_func_ptr)->data; \ - ((void(*)(void *, proto))(it_func))(__data, args); \ + /* \ +* Removed functions that couldn't be allocated \ +* are replaced with TRACEPOINT_STUB. \ +*/ \ + if (likely(it_func != TRACEPOINT_STUB)) \ + ((void(*)(void *, proto))(it_func))(__data, args); \ } while ((++it_func_ptr)->func);\ return 0; \ } \ diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 3f659f855074..20ce78b3f578 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -53,10 +53,10 @@ struct tp_probes { struct tracepoint_func probes[]; }; -static inline void *allocate_probes(int count) +static inline void *allocate_probes(int count, gfp_t extra_flags) { struct tp_probes *p = kmalloc(struct_size(p, probes, count), - GFP_KERNEL); + GFP_KERNEL | extra_flags); return p == NULL ? NULL : p->probes; } @@ -131,6 +131,7 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func, { struct tracepoint_func *old, *new; int nr_probes = 0; + int stub_funcs = 0; int pos = -1; if (WARN_ON(!tp_func->func)) @@ -147,14 +148,34 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func, if (old[nr_probes].func == tp_func->func && old[nr_probes].data == tp_func->data) return ERR_PTR(-EEXIST); + if (old[nr_probes].func == TRACEPOINT_STUB) + stub_funcs++; } } - /* + 2 : one for new probe, one for NULL func */ - new = allocate_probes(nr_probes + 2); + /* + 2 : one for new probe, one for NULL func - stub functions */ + new = allocate_probes(nr_probes + 2 - stub_funcs, 0); if (new == NULL) return ERR_PTR(-ENOMEM); if (old) { - if (pos < 0) { + if (stub_funcs) { + /* Need to copy one at a time to remove stubs */ +