Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
On Tue, Nov 17, 2020 at 08:09:22PM -0500, Steven Rostedt wrote: > On Tue, 17 Nov 2020 16:42:44 -0800 > Matt Mullins wrote: > > > > > Indeed with a stub function, I don't see any need for > > > READ_ONCE/WRITE_ONCE. > > > > I'm not sure if this is a practical issue, but without WRITE_ONCE, can't > > the write be torn? A racing __traceiter_ could potentially see a > > half-modified function pointer, which wouldn't work out too well. > > This has been discussed before, and Linus said: > > "We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not > because of some theoretical "compiler is free to do garbage" > arguments. If such garbage happens, we need to fix the compiler" > > https://lore.kernel.org/lkml/CAHk-=wi_KeD1M-_-_SU_H92vJ-yNkDnAGhAS=rr1ynngwkw...@mail.gmail.com/ I have to ask... Did the ARM compilers get fixed? As of a few months ago, they would tear stores of some constant values. > > This was actually my gut instinct before I wrote the __GFP_NOFAIL > > instead -- currently that whole array's memory ordering is provided by > > RCU and I didn't dive deep enough to evaluate getting too clever with > > atomic modifications to it. > > The pointers are always going to be the architecture word size (by > definition), and any compiler that tears a write of a long is broken. But yes, if the write is of a non-constant pointer, the compiler does have less leverage. Thanx, Paul
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
On Tue, 17 Nov 2020 16:42:44 -0800 Matt Mullins wrote: > > Indeed with a stub function, I don't see any need for READ_ONCE/WRITE_ONCE. > > > > I'm not sure if this is a practical issue, but without WRITE_ONCE, can't > the write be torn? A racing __traceiter_ could potentially see a > half-modified function pointer, which wouldn't work out too well. This has been discussed before, and Linus said: "We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not because of some theoretical "compiler is free to do garbage" arguments. If such garbage happens, we need to fix the compiler" https://lore.kernel.org/lkml/CAHk-=wi_KeD1M-_-_SU_H92vJ-yNkDnAGhAS=rr1ynngwkw...@mail.gmail.com/ > > This was actually my gut instinct before I wrote the __GFP_NOFAIL > instead -- currently that whole array's memory ordering is provided by > RCU and I didn't dive deep enough to evaluate getting too clever with > atomic modifications to it. The pointers are always going to be the architecture word size (by definition), and any compiler that tears a write of a long is broken. -- Steve
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
On Tue, Nov 17, 2020 at 06:05:51PM -0500, Mathieu Desnoyers wrote: > - On Nov 16, 2020, at 5:10 PM, rostedt rost...@goodmis.org wrote: > > > On Mon, 16 Nov 2020 16:34:41 -0500 (EST) > > Mathieu Desnoyers wrote: > > [...] > > >> I think you'll want a WRITE_ONCE(old[i].func, tp_stub_func) here, matched > >> with a READ_ONCE() in __DO_TRACE. This introduces a new situation where the > >> func pointer can be updated and loaded concurrently. > > > > I thought about this a little, and then only thing we really should worry > > about is synchronizing with those that unregister. Because when we make > > this update, there are now two states. the __DO_TRACE either reads the > > original func or the stub. And either should be OK to call. > > > > Only the func gets updated and not the data. So what exactly are we worried > > about here? > > Indeed with a stub function, I don't see any need for READ_ONCE/WRITE_ONCE. I'm not sure if this is a practical issue, but without WRITE_ONCE, can't the write be torn? A racing __traceiter_ could potentially see a half-modified function pointer, which wouldn't work out too well. This was actually my gut instinct before I wrote the __GFP_NOFAIL instead -- currently that whole array's memory ordering is provided by RCU and I didn't dive deep enough to evaluate getting too clever with atomic modifications to it. > > However, if we want to compare the function pointer to some other value and > conditionally do (or skip) the call, I think you'll need the > READ_ONCE/WRITE_ONCE > to make sure the pointer is not re-fetched between comparison and call. > > Thanks, > > Mathieu > > -- > Mathieu Desnoyers > EfficiOS Inc. > http://www.efficios.com
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
- On Nov 16, 2020, at 5:10 PM, rostedt rost...@goodmis.org wrote: > On Mon, 16 Nov 2020 16:34:41 -0500 (EST) > Mathieu Desnoyers wrote: [...] >> I think you'll want a WRITE_ONCE(old[i].func, tp_stub_func) here, matched >> with a READ_ONCE() in __DO_TRACE. This introduces a new situation where the >> func pointer can be updated and loaded concurrently. > > I thought about this a little, and then only thing we really should worry > about is synchronizing with those that unregister. Because when we make > this update, there are now two states. the __DO_TRACE either reads the > original func or the stub. And either should be OK to call. > > Only the func gets updated and not the data. So what exactly are we worried > about here? Indeed with a stub function, I don't see any need for READ_ONCE/WRITE_ONCE. However, if we want to compare the function pointer to some other value and conditionally do (or skip) the call, I think you'll need the READ_ONCE/WRITE_ONCE to make sure the pointer is not re-fetched between comparison and call. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
On Mon, 16 Nov 2020 16:34:41 -0500 (EST) Mathieu Desnoyers wrote: > - On Nov 16, 2020, at 4:02 PM, rostedt rost...@goodmis.org wrote: > > > On Mon, 16 Nov 2020 15:44:37 -0500 > > Steven Rostedt wrote: > > > >> If you use a stub function, it shouldn't affect anything. And the worse > >> that would happen is that you have a slight overhead of calling the stub > >> until you can properly remove the callback. > > > > Something like this: > > > > (haven't compiled it yet, I'm about to though). Still need more accounting to work on. Almost finished though. ;-) > > > > -- Steve > > > > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c > > index 3f659f855074..8eab40f9d388 100644 > > --- a/kernel/tracepoint.c > > +++ b/kernel/tracepoint.c > > @@ -53,10 +53,16 @@ struct tp_probes { > > struct tracepoint_func probes[]; > > }; > > > > -static inline void *allocate_probes(int count) > > +/* Called in removal of a func but failed to allocate a new tp_funcs */ > > +static void tp_stub_func(void) > > I'm still not sure whether it's OK to call a (void) function with arguments. Actually, I've done it. The thing is, what can actually happen? A void function that simply returns should not do anything. If anything, the only waste is that the caller would save more registers than necessary. I can't think of anything that can actually happen, but perhaps there is. I wouldn't want to make a stub function for every trace point (it wouldn't be hard to do). But perhaps we should ask the compiler people to make sure. > > > +{ > > + return; > > +} > > + > > +static inline void *allocate_probes(int count, gfp_t extra_flags) > > { > > struct tp_probes *p = kmalloc(struct_size(p, probes, count), > > - GFP_KERNEL); > > + GFP_KERNEL | extra_flags); > > return p == NULL ? NULL : p->probes; > > } > > > > @@ -150,7 +156,7 @@ func_add(struct tracepoint_func **funcs, struct > > tracepoint_func *tp_func, > > } > > } > > /* + 2 : one for new probe, one for NULL func */ > > - new = allocate_probes(nr_probes + 2); > > + new = allocate_probes(nr_probes + 2, 0); > > if (new == NULL) > > return ERR_PTR(-ENOMEM); > > if (old) { > > @@ -188,8 +194,9 @@ static void *func_remove(struct tracepoint_func **funcs, > > /* (N -> M), (N > 1, M >= 0) probes */ > > if (tp_func->func) { > > for (nr_probes = 0; old[nr_probes].func; nr_probes++) { > > - if (old[nr_probes].func == tp_func->func && > > -old[nr_probes].data == tp_func->data) > > + if ((old[nr_probes].func == tp_func->func && > > +old[nr_probes].data == tp_func->data) || > > + old[nr_probes].func == tp_stub_func) > > nr_del++; > > } > > } > > @@ -207,15 +214,20 @@ static void *func_remove(struct tracepoint_func > > **funcs, > > int j = 0; > > /* N -> M, (N > 1, M > 0) */ > > /* + 1 for NULL */ > > - new = allocate_probes(nr_probes - nr_del + 1); > > - if (new == NULL) > > - return ERR_PTR(-ENOMEM); > > - for (i = 0; old[i].func; i++) > > - if (old[i].func != tp_func->func > > - || old[i].data != tp_func->data) > > - new[j++] = old[i]; > > - new[nr_probes - nr_del].func = NULL; > > - *funcs = new; > > + new = allocate_probes(nr_probes - nr_del + 1, __GFP_NOFAIL); > > + if (new) { > > + for (i = 0; old[i].func; i++) > > + if (old[i].func != tp_func->func > > + || old[i].data != tp_func->data) > > as you point out in your reply, skip tp_stub_func here too. > > > + new[j++] = old[i]; > > + new[nr_probes - nr_del].func = NULL; > > + } else { > > + for (i = 0; old[i].func; i++) > > + if (old[i].func == tp_func->func && > > + old[i].data == tp_func->data) > > + old[i].func = tp_stub_func; > > I think you'll want a WRITE_ONCE(old[i].func, tp_stub_func) here, matched > with a READ_ONCE() in __DO_TRACE. This introduces a new situation where the > func pointer can be updated and loaded concurrently. I thought about this a little, and then only thing we really should worry about is synchronizing with those that unregister. Because when we make this update, there are now two states. the __DO_TRACE either reads the original func or the stub. And either should be OK to call. Only the func gets updated and not the data. So what exactly are we worried about here? > > > + } > > + *funcs = old; > > The line above seems wrong for
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
- On Nov 16, 2020, at 4:02 PM, rostedt rost...@goodmis.org wrote: > On Mon, 16 Nov 2020 15:44:37 -0500 > Steven Rostedt wrote: > >> If you use a stub function, it shouldn't affect anything. And the worse >> that would happen is that you have a slight overhead of calling the stub >> until you can properly remove the callback. > > Something like this: > > (haven't compiled it yet, I'm about to though). > > -- Steve > > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c > index 3f659f855074..8eab40f9d388 100644 > --- a/kernel/tracepoint.c > +++ b/kernel/tracepoint.c > @@ -53,10 +53,16 @@ struct tp_probes { > struct tracepoint_func probes[]; > }; > > -static inline void *allocate_probes(int count) > +/* Called in removal of a func but failed to allocate a new tp_funcs */ > +static void tp_stub_func(void) I'm still not sure whether it's OK to call a (void) function with arguments. > +{ > + return; > +} > + > +static inline void *allocate_probes(int count, gfp_t extra_flags) > { > struct tp_probes *p = kmalloc(struct_size(p, probes, count), > -GFP_KERNEL); > +GFP_KERNEL | extra_flags); > return p == NULL ? NULL : p->probes; > } > > @@ -150,7 +156,7 @@ func_add(struct tracepoint_func **funcs, struct > tracepoint_func *tp_func, > } > } > /* + 2 : one for new probe, one for NULL func */ > - new = allocate_probes(nr_probes + 2); > + new = allocate_probes(nr_probes + 2, 0); > if (new == NULL) > return ERR_PTR(-ENOMEM); > if (old) { > @@ -188,8 +194,9 @@ static void *func_remove(struct tracepoint_func **funcs, > /* (N -> M), (N > 1, M >= 0) probes */ > if (tp_func->func) { > for (nr_probes = 0; old[nr_probes].func; nr_probes++) { > - if (old[nr_probes].func == tp_func->func && > - old[nr_probes].data == tp_func->data) > + if ((old[nr_probes].func == tp_func->func && > + old[nr_probes].data == tp_func->data) || > + old[nr_probes].func == tp_stub_func) > nr_del++; > } > } > @@ -207,15 +214,20 @@ static void *func_remove(struct tracepoint_func **funcs, > int j = 0; > /* N -> M, (N > 1, M > 0) */ > /* + 1 for NULL */ > - new = allocate_probes(nr_probes - nr_del + 1); > - if (new == NULL) > - return ERR_PTR(-ENOMEM); > - for (i = 0; old[i].func; i++) > - if (old[i].func != tp_func->func > - || old[i].data != tp_func->data) > - new[j++] = old[i]; > - new[nr_probes - nr_del].func = NULL; > - *funcs = new; > + new = allocate_probes(nr_probes - nr_del + 1, __GFP_NOFAIL); > + if (new) { > + for (i = 0; old[i].func; i++) > + if (old[i].func != tp_func->func > + || old[i].data != tp_func->data) as you point out in your reply, skip tp_stub_func here too. > + new[j++] = old[i]; > + new[nr_probes - nr_del].func = NULL; > + } else { > + for (i = 0; old[i].func; i++) > + if (old[i].func == tp_func->func && > + old[i].data == tp_func->data) > + old[i].func = tp_stub_func; I think you'll want a WRITE_ONCE(old[i].func, tp_stub_func) here, matched with a READ_ONCE() in __DO_TRACE. This introduces a new situation where the func pointer can be updated and loaded concurrently. > + } > + *funcs = old; The line above seems wrong for the successful allocate_probe case. You will likely want *funcs = new on successful allocation, and *funcs = old for the failure case. Thanks, Mathieu > } > debug_print_probes(*funcs); > return old; > @@ -300,6 +312,10 @@ static int tracepoint_remove_func(struct tracepoint *tp, > return PTR_ERR(old); > } > > + if (tp_funcs == old) > + /* Failed allocating new tp_funcs, replaced func with stub */ > + return 0; > + > if (!tp_funcs) { > /* Removed last function */ > if (tp->unregfunc && static_key_enabled(&tp->key)) -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
- On Nov 16, 2020, at 3:44 PM, rostedt rost...@goodmis.org wrote: > On Mon, 16 Nov 2020 15:37:27 -0500 (EST) > Mathieu Desnoyers wrote: >> > >> > Mathieu, >> > >> > Can't we do something that would still allow to unregister a probe even if >> > a new probe array fails to allocate? We could kick off a irq work to try to >> > clean up the probe at a later time, but still, the unregister itself should >> > not fail due to memory failure. >> >> Currently, the fast path iteration looks like: >> >> struct tracepoint_func *it_func_ptr; >> void *it_func; >> >> it_func_ptr = \ >> rcu_dereference_raw((&__tracepoint_##_name)->funcs); >> \ >> do {\ >> it_func = (it_func_ptr)->func; \ >> __data = (it_func_ptr)->data; \ >> ((void(*)(void *, proto))(it_func))(__data, args); \ >> } while ((++it_func_ptr)->func); >> >> So we RCU dereference the array, and iterate on the array until we find a >> NULL >> func. So you could not use NULL to skip items, but you could perhaps reserve >> a (void *)0x1UL tombstone for this. > > Actually, you could just set it to a stub callback that does nothing. then > you don't even need to touch the above macro. Not sure why I didn't > recommend this to begin with, because that's exactly what the function > tracer does with ftrace_stub. I like the stub idea. What prototype should the stub function have ? Is it acceptable to pass arguments to a function expecting (void) ? If not, then we may need to create stub functions for each tracepoint. > > >> >> It should ideally be an unlikely branch, and it would be good to benchmark >> the >> change when multiple tracing probes are attached to figure out whether the >> overhead is significant when tracing is enabled. > > If you use a stub function, it shouldn't affect anything. And the worse > that would happen is that you have a slight overhead of calling the stub > until you can properly remove the callback. > >> >> I wonder whether we really mind that much about using slightly more memory >> than required after a failed reallocation due to ENOMEM. Perhaps the irq work >> is not even needed. Chances are that the irq work would fail again and again >> if >> it's in low memory conditions. So maybe it's better to just keep the >> tombstone >> in place until the next successful callback array reallocation. >> > > True. If we just replace the function with a stub on memory failure (always > using __GFP_NOFAIL, and if it fails to reallocate a new array, just replace > the callback with the stub and be done with it. It may require some more > accounting to make sure the tracepoint.c code can handle these stubs, and > remove them on new additions to the code. Yes. > Heck, if a stub exists, you could just swap it with a new item. Not without proper synchronization, otherwise you could end up with mismatch between function callback and private data. The only transition valid without waiting for an rcu grace period is to change the function pointer to a stub function. Anything else (e.g. replacing the stub by some other callback function) would require to wait for a grace period beforehand. > But on any new changes to the list, the stubs should be purged. Yes, Thanks, Mathieu > > > -- Steve -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
On Mon, 16 Nov 2020 16:02:18 -0500 Steven Rostedt wrote: > + if (new) { > + for (i = 0; old[i].func; i++) > + if (old[i].func != tp_func->func > + || old[i].data != tp_func->data) > + new[j++] = old[i]; Oops, need to check for old[i].func != tp_stub_func here too. -- Steve > + new[nr_probes - nr_del].func = NULL; > + } else {
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
On Mon, 16 Nov 2020 15:44:37 -0500 Steven Rostedt wrote: > If you use a stub function, it shouldn't affect anything. And the worse > that would happen is that you have a slight overhead of calling the stub > until you can properly remove the callback. Something like this: (haven't compiled it yet, I'm about to though). -- Steve diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 3f659f855074..8eab40f9d388 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -53,10 +53,16 @@ struct tp_probes { struct tracepoint_func probes[]; }; -static inline void *allocate_probes(int count) +/* Called in removal of a func but failed to allocate a new tp_funcs */ +static void tp_stub_func(void) +{ + return; +} + +static inline void *allocate_probes(int count, gfp_t extra_flags) { struct tp_probes *p = kmalloc(struct_size(p, probes, count), - GFP_KERNEL); + GFP_KERNEL | extra_flags); return p == NULL ? NULL : p->probes; } @@ -150,7 +156,7 @@ func_add(struct tracepoint_func **funcs, struct tracepoint_func *tp_func, } } /* + 2 : one for new probe, one for NULL func */ - new = allocate_probes(nr_probes + 2); + new = allocate_probes(nr_probes + 2, 0); if (new == NULL) return ERR_PTR(-ENOMEM); if (old) { @@ -188,8 +194,9 @@ static void *func_remove(struct tracepoint_func **funcs, /* (N -> M), (N > 1, M >= 0) probes */ if (tp_func->func) { for (nr_probes = 0; old[nr_probes].func; nr_probes++) { - if (old[nr_probes].func == tp_func->func && -old[nr_probes].data == tp_func->data) + if ((old[nr_probes].func == tp_func->func && +old[nr_probes].data == tp_func->data) || + old[nr_probes].func == tp_stub_func) nr_del++; } } @@ -207,15 +214,20 @@ static void *func_remove(struct tracepoint_func **funcs, int j = 0; /* N -> M, (N > 1, M > 0) */ /* + 1 for NULL */ - new = allocate_probes(nr_probes - nr_del + 1); - if (new == NULL) - return ERR_PTR(-ENOMEM); - for (i = 0; old[i].func; i++) - if (old[i].func != tp_func->func - || old[i].data != tp_func->data) - new[j++] = old[i]; - new[nr_probes - nr_del].func = NULL; - *funcs = new; + new = allocate_probes(nr_probes - nr_del + 1, __GFP_NOFAIL); + if (new) { + for (i = 0; old[i].func; i++) + if (old[i].func != tp_func->func + || old[i].data != tp_func->data) + new[j++] = old[i]; + new[nr_probes - nr_del].func = NULL; + } else { + for (i = 0; old[i].func; i++) + if (old[i].func == tp_func->func && + old[i].data == tp_func->data) + old[i].func = tp_stub_func; + } + *funcs = old; } debug_print_probes(*funcs); return old; @@ -300,6 +312,10 @@ static int tracepoint_remove_func(struct tracepoint *tp, return PTR_ERR(old); } + if (tp_funcs == old) + /* Failed allocating new tp_funcs, replaced func with stub */ + return 0; + if (!tp_funcs) { /* Removed last function */ if (tp->unregfunc && static_key_enabled(&tp->key))
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
On Mon, 16 Nov 2020 15:37:27 -0500 (EST) Mathieu Desnoyers wrote: > > > > Mathieu, > > > > Can't we do something that would still allow to unregister a probe even if > > a new probe array fails to allocate? We could kick off a irq work to try to > > clean up the probe at a later time, but still, the unregister itself should > > not fail due to memory failure. > > Currently, the fast path iteration looks like: > > struct tracepoint_func *it_func_ptr; > void *it_func; > > it_func_ptr = \ > rcu_dereference_raw((&__tracepoint_##_name)->funcs); \ > do {\ > it_func = (it_func_ptr)->func; \ > __data = (it_func_ptr)->data; \ > ((void(*)(void *, proto))(it_func))(__data, args); \ > } while ((++it_func_ptr)->func); > > So we RCU dereference the array, and iterate on the array until we find a NULL > func. So you could not use NULL to skip items, but you could perhaps reserve > a (void *)0x1UL tombstone for this. Actually, you could just set it to a stub callback that does nothing. then you don't even need to touch the above macro. Not sure why I didn't recommend this to begin with, because that's exactly what the function tracer does with ftrace_stub. > > It should ideally be an unlikely branch, and it would be good to benchmark the > change when multiple tracing probes are attached to figure out whether the > overhead is significant when tracing is enabled. If you use a stub function, it shouldn't affect anything. And the worse that would happen is that you have a slight overhead of calling the stub until you can properly remove the callback. > > I wonder whether we really mind that much about using slightly more memory > than required after a failed reallocation due to ENOMEM. Perhaps the irq work > is not even needed. Chances are that the irq work would fail again and again > if > it's in low memory conditions. So maybe it's better to just keep the tombstone > in place until the next successful callback array reallocation. > True. If we just replace the function with a stub on memory failure (always using __GFP_NOFAIL, and if it fails to reallocate a new array, just replace the callback with the stub and be done with it. It may require some more accounting to make sure the tracepoint.c code can handle these stubs, and remove them on new additions to the code. Heck, if a stub exists, you could just swap it with a new item. But on any new changes to the list, the stubs should be purged. -- Steve
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
- On Nov 16, 2020, at 12:19 PM, rostedt rost...@goodmis.org wrote: > On Sat, 14 Nov 2020 21:52:55 -0800 > Matt Mullins wrote: > >> bpf_link_free is always called in process context, including from a >> workqueue and from __fput. Neither of these have the ability to >> propagate an -ENOMEM to the caller. >> > > Hmm, I think the real fix is to not have unregistering a tracepoint probe > fail because of allocation. We are removing a probe, perhaps we could just > inject NULL pointer that gets checked via the DO_TRACE loop? > > I bet failing an unregister because of an allocation failure causes > problems elsewhere than just BPF. > > Mathieu, > > Can't we do something that would still allow to unregister a probe even if > a new probe array fails to allocate? We could kick off a irq work to try to > clean up the probe at a later time, but still, the unregister itself should > not fail due to memory failure. Currently, the fast path iteration looks like: struct tracepoint_func *it_func_ptr; void *it_func; it_func_ptr = \ rcu_dereference_raw((&__tracepoint_##_name)->funcs); \ do {\ it_func = (it_func_ptr)->func; \ __data = (it_func_ptr)->data; \ ((void(*)(void *, proto))(it_func))(__data, args); \ } while ((++it_func_ptr)->func); So we RCU dereference the array, and iterate on the array until we find a NULL func. So you could not use NULL to skip items, but you could perhaps reserve a (void *)0x1UL tombstone for this. It should ideally be an unlikely branch, and it would be good to benchmark the change when multiple tracing probes are attached to figure out whether the overhead is significant when tracing is enabled. I wonder whether we really mind that much about using slightly more memory than required after a failed reallocation due to ENOMEM. Perhaps the irq work is not even needed. Chances are that the irq work would fail again and again if it's in low memory conditions. So maybe it's better to just keep the tombstone in place until the next successful callback array reallocation. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com
Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp
On Sat, 14 Nov 2020 21:52:55 -0800 Matt Mullins wrote: > bpf_link_free is always called in process context, including from a > workqueue and from __fput. Neither of these have the ability to > propagate an -ENOMEM to the caller. > Hmm, I think the real fix is to not have unregistering a tracepoint probe fail because of allocation. We are removing a probe, perhaps we could just inject NULL pointer that gets checked via the DO_TRACE loop? I bet failing an unregister because of an allocation failure causes problems elsewhere than just BPF. Mathieu, Can't we do something that would still allow to unregister a probe even if a new probe array fails to allocate? We could kick off a irq work to try to clean up the probe at a later time, but still, the unregister itself should not fail due to memory failure. -- Steve > Reported-by: syzbot+83aa762ef23b6f0d1...@syzkaller.appspotmail.com > Reported-by: syzbot+d29e58bb557324e55...@syzkaller.appspotmail.com > Signed-off-by: Matt Mullins > --- > I previously referenced a "pretty ugly" patch. This is not that one, > because I don't think there's any way I can make the caller of > ->release() actually handle errors like ENOMEM. > > It also looks like most of the other ways tracepoint_probe_unregister is > called also don't check the error code (e.g. just a quick grep found > blk_unregister_tracepoints). Should this just be upgraded to GFP_NOFAIL > across the board instead of passing around a gfp_t? > > include/linux/trace_events.h | 6 -- > include/linux/tracepoint.h | 7 +-- > kernel/bpf/syscall.c | 2 +- > kernel/trace/bpf_trace.c | 6 -- > kernel/trace/trace_events.c | 6 -- > kernel/tracepoint.c | 20 ++-- > 6 files changed, 28 insertions(+), 19 deletions(-) > > diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h > index 5c6943354049..166ad7646a98 100644 > --- a/include/linux/trace_events.h > +++ b/include/linux/trace_events.h > @@ -625,7 +625,8 @@ int perf_event_attach_bpf_prog(struct perf_event *event, > struct bpf_prog *prog); > void perf_event_detach_bpf_prog(struct perf_event *event); > int perf_event_query_prog_array(struct perf_event *event, void __user *info); > int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog); > -int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog > *prog); > +int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog > *prog, > + gfp_t flags); > struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name); > void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp); > int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id, > @@ -654,7 +655,8 @@ static inline int bpf_probe_register(struct > bpf_raw_event_map *btp, struct bpf_p > { > return -EOPNOTSUPP; > } > -static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct > bpf_prog *p) > +static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, > +struct bpf_prog *p, gfp_t flags) > { > return -EOPNOTSUPP; > } > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index 598fec9f9dbf..7b02f92f3b8f 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -12,6 +12,7 @@ > * Heavily inspired from the Linux Kernel Markers. > */ > > +#include > #include > #include > #include > @@ -40,7 +41,8 @@ extern int > tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void > *data, > int prio); > extern int > -tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data); > +tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data, > + gfp_t flags); > extern void > for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv), > void *priv); > @@ -260,7 +262,8 @@ static inline struct tracepoint > *tracepoint_ptr_deref(tracepoint_ptr_t *p) > unregister_trace_##name(void (*probe)(data_proto), void *data) \ > { \ > return tracepoint_probe_unregister(&__tracepoint_##name,\ > - (void *)probe, data); \ > + (void *)probe, data,\ > + GFP_KERNEL);\ > } \ > static inline void \ > check_trace_callback_type_##name(void (*cb)(data_proto))\ > diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c > index b999e7ff2583..f6876681c4ab 100644 > --- a/kernel/bpf/syscall.c > +++ b/kernel/bpf/syscall.c > @@ -2601,7 +2601,7 @@ static void bpf_raw_tp_link_release(struct bpf_link > *link)