subject:"Re\: \[PATCH\] bpf\: don't fail kmalloc while releasing raw

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-17 Thread Paul E. McKenney

On Tue, Nov 17, 2020 at 08:09:22PM -0500, Steven Rostedt wrote:
> On Tue, 17 Nov 2020 16:42:44 -0800
> Matt Mullins  wrote:
> 
> 
> > > Indeed with a stub function, I don't see any need for 
> > > READ_ONCE/WRITE_ONCE.  
> > 
> > I'm not sure if this is a practical issue, but without WRITE_ONCE, can't
> > the write be torn?  A racing __traceiter_ could potentially see a
> > half-modified function pointer, which wouldn't work out too well.
> 
> This has been discussed before, and Linus said:
> 
> "We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
> because of some theoretical "compiler is free to do garbage"
> arguments. If such garbage happens, we need to fix the compiler"
> 
> https://lore.kernel.org/lkml/CAHk-=wi_KeD1M-_-_SU_H92vJ-yNkDnAGhAS=rr1ynngwkw...@mail.gmail.com/

I have to ask...  Did the ARM compilers get fixed?  As of a few
months ago, they would tear stores of some constant values.

> > This was actually my gut instinct before I wrote the __GFP_NOFAIL
> > instead -- currently that whole array's memory ordering is provided by
> > RCU and I didn't dive deep enough to evaluate getting too clever with
> > atomic modifications to it.
> 
> The pointers are always going to be the architecture word size (by
> definition), and any compiler that tears a write of a long is broken.

But yes, if the write is of a non-constant pointer, the compiler does
have less leverage.

Thanx, Paul

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-17 Thread Steven Rostedt

On Tue, 17 Nov 2020 16:42:44 -0800
Matt Mullins  wrote:

> > Indeed with a stub function, I don't see any need for READ_ONCE/WRITE_ONCE. 
> >  
> 
> I'm not sure if this is a practical issue, but without WRITE_ONCE, can't
> the write be torn?  A racing __traceiter_ could potentially see a
> half-modified function pointer, which wouldn't work out too well.

This has been discussed before, and Linus said:

"We add READ_ONCE and WRITE_ONCE annotations when they make sense. Not
because of some theoretical "compiler is free to do garbage"
arguments. If such garbage happens, we need to fix the compiler"

https://lore.kernel.org/lkml/CAHk-=wi_KeD1M-_-_SU_H92vJ-yNkDnAGhAS=rr1ynngwkw...@mail.gmail.com/

> 
> This was actually my gut instinct before I wrote the __GFP_NOFAIL
> instead -- currently that whole array's memory ordering is provided by
> RCU and I didn't dive deep enough to evaluate getting too clever with
> atomic modifications to it.

The pointers are always going to be the architecture word size (by
definition), and any compiler that tears a write of a long is broken.

-- Steve

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-17 Thread Matt Mullins

On Tue, Nov 17, 2020 at 06:05:51PM -0500, Mathieu Desnoyers wrote:
> - On Nov 16, 2020, at 5:10 PM, rostedt rost...@goodmis.org wrote:
> 
> > On Mon, 16 Nov 2020 16:34:41 -0500 (EST)
> > Mathieu Desnoyers  wrote:
> 
> [...]
> 
> >> I think you'll want a WRITE_ONCE(old[i].func, tp_stub_func) here, matched
> >> with a READ_ONCE() in __DO_TRACE. This introduces a new situation where the
> >> func pointer can be updated and loaded concurrently.
> > 
> > I thought about this a little, and then only thing we really should worry
> > about is synchronizing with those that unregister. Because when we make
> > this update, there are now two states. the __DO_TRACE either reads the
> > original func or the stub. And either should be OK to call.
> > 
> > Only the func gets updated and not the data. So what exactly are we worried
> > about here?
> 
> Indeed with a stub function, I don't see any need for READ_ONCE/WRITE_ONCE.

I'm not sure if this is a practical issue, but without WRITE_ONCE, can't
the write be torn?  A racing __traceiter_ could potentially see a
half-modified function pointer, which wouldn't work out too well.

This was actually my gut instinct before I wrote the __GFP_NOFAIL
instead -- currently that whole array's memory ordering is provided by
RCU and I didn't dive deep enough to evaluate getting too clever with
atomic modifications to it.

> 
> However, if we want to compare the function pointer to some other value and
> conditionally do (or skip) the call, I think you'll need the 
> READ_ONCE/WRITE_ONCE
> to make sure the pointer is not re-fetched between comparison and call.
> 
> Thanks,
> 
> Mathieu
> 
> -- 
> Mathieu Desnoyers
> EfficiOS Inc.
> http://www.efficios.com

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-17 Thread Mathieu Desnoyers

- On Nov 16, 2020, at 5:10 PM, rostedt rost...@goodmis.org wrote:

> On Mon, 16 Nov 2020 16:34:41 -0500 (EST)
> Mathieu Desnoyers  wrote:

[...]

>> I think you'll want a WRITE_ONCE(old[i].func, tp_stub_func) here, matched
>> with a READ_ONCE() in __DO_TRACE. This introduces a new situation where the
>> func pointer can be updated and loaded concurrently.
> 
> I thought about this a little, and then only thing we really should worry
> about is synchronizing with those that unregister. Because when we make
> this update, there are now two states. the __DO_TRACE either reads the
> original func or the stub. And either should be OK to call.
> 
> Only the func gets updated and not the data. So what exactly are we worried
> about here?

Indeed with a stub function, I don't see any need for READ_ONCE/WRITE_ONCE.

However, if we want to compare the function pointer to some other value and
conditionally do (or skip) the call, I think you'll need the 
READ_ONCE/WRITE_ONCE
to make sure the pointer is not re-fetched between comparison and call.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-16 Thread Steven Rostedt

On Mon, 16 Nov 2020 16:34:41 -0500 (EST)
Mathieu Desnoyers  wrote:

> - On Nov 16, 2020, at 4:02 PM, rostedt rost...@goodmis.org wrote:
> 
> > On Mon, 16 Nov 2020 15:44:37 -0500
> > Steven Rostedt  wrote:
> >   
> >> If you use a stub function, it shouldn't affect anything. And the worse
> >> that would happen is that you have a slight overhead of calling the stub
> >> until you can properly remove the callback.  
> > 
> > Something like this:
> > 
> > (haven't compiled it yet, I'm about to though).

Still need more accounting to work on. Almost finished though. ;-)

> > 
> > -- Steve
> > 
> > diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> > index 3f659f855074..8eab40f9d388 100644
> > --- a/kernel/tracepoint.c
> > +++ b/kernel/tracepoint.c
> > @@ -53,10 +53,16 @@ struct tp_probes {
> > struct tracepoint_func probes[];
> > };
> > 
> > -static inline void *allocate_probes(int count)
> > +/* Called in removal of a func but failed to allocate a new tp_funcs */
> > +static void tp_stub_func(void)  
> 
> I'm still not sure whether it's OK to call a (void) function with arguments.

Actually, I've done it. The thing is, what can actually happen? A void
function that simply returns should not do anything. If anything, the only
waste is that the caller would save more registers than necessary.

I can't think of anything that can actually happen, but perhaps there is. I
wouldn't want to make a stub function for every trace point (it wouldn't be
hard to do).

But perhaps we should ask the compiler people to make sure.

> 
> > +{
> > +   return;
> > +}
> > +
> > +static inline void *allocate_probes(int count, gfp_t extra_flags)
> > {
> > struct tp_probes *p  = kmalloc(struct_size(p, probes, count),
> > -  GFP_KERNEL);
> > +  GFP_KERNEL | extra_flags);
> > return p == NULL ? NULL : p->probes;
> > }
> > 
> > @@ -150,7 +156,7 @@ func_add(struct tracepoint_func **funcs, struct
> > tracepoint_func *tp_func,
> > }
> > }
> > /* + 2 : one for new probe, one for NULL func */
> > -   new = allocate_probes(nr_probes + 2);
> > +   new = allocate_probes(nr_probes + 2, 0);
> > if (new == NULL)
> > return ERR_PTR(-ENOMEM);
> > if (old) {
> > @@ -188,8 +194,9 @@ static void *func_remove(struct tracepoint_func **funcs,
> > /* (N -> M), (N > 1, M >= 0) probes */
> > if (tp_func->func) {
> > for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
> > -   if (old[nr_probes].func == tp_func->func &&
> > -old[nr_probes].data == tp_func->data)
> > +   if ((old[nr_probes].func == tp_func->func &&
> > +old[nr_probes].data == tp_func->data) ||
> > +   old[nr_probes].func == tp_stub_func)
> > nr_del++;
> > }
> > }
> > @@ -207,15 +214,20 @@ static void *func_remove(struct tracepoint_func 
> > **funcs,
> > int j = 0;
> > /* N -> M, (N > 1, M > 0) */
> > /* + 1 for NULL */
> > -   new = allocate_probes(nr_probes - nr_del + 1);
> > -   if (new == NULL)
> > -   return ERR_PTR(-ENOMEM);
> > -   for (i = 0; old[i].func; i++)
> > -   if (old[i].func != tp_func->func
> > -   || old[i].data != tp_func->data)
> > -   new[j++] = old[i];
> > -   new[nr_probes - nr_del].func = NULL;
> > -   *funcs = new;
> > +   new = allocate_probes(nr_probes - nr_del + 1, __GFP_NOFAIL);
> > +   if (new) {
> > +   for (i = 0; old[i].func; i++)
> > +   if (old[i].func != tp_func->func
> > +   || old[i].data != tp_func->data)  
> 
> as you point out in your reply, skip tp_stub_func here too.
> 
> > +   new[j++] = old[i];
> > +   new[nr_probes - nr_del].func = NULL;
> > +   } else {
> > +   for (i = 0; old[i].func; i++)
> > +   if (old[i].func == tp_func->func &&
> > +   old[i].data == tp_func->data)
> > +   old[i].func = tp_stub_func;  
> 
> I think you'll want a WRITE_ONCE(old[i].func, tp_stub_func) here, matched
> with a READ_ONCE() in __DO_TRACE. This introduces a new situation where the
> func pointer can be updated and loaded concurrently.

I thought about this a little, and then only thing we really should worry
about is synchronizing with those that unregister. Because when we make
this update, there are now two states. the __DO_TRACE either reads the
original func or the stub. And either should be OK to call.

Only the func gets updated and not the data. So what exactly are we worried
about here?

> 
> > +   }
> > +   *funcs = old;  
> 
> The line above seems wrong for

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-16 Thread Mathieu Desnoyers

- On Nov 16, 2020, at 4:02 PM, rostedt rost...@goodmis.org wrote:

> On Mon, 16 Nov 2020 15:44:37 -0500
> Steven Rostedt  wrote:
> 
>> If you use a stub function, it shouldn't affect anything. And the worse
>> that would happen is that you have a slight overhead of calling the stub
>> until you can properly remove the callback.
> 
> Something like this:
> 
> (haven't compiled it yet, I'm about to though).
> 
> -- Steve
> 
> diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
> index 3f659f855074..8eab40f9d388 100644
> --- a/kernel/tracepoint.c
> +++ b/kernel/tracepoint.c
> @@ -53,10 +53,16 @@ struct tp_probes {
>   struct tracepoint_func probes[];
> };
> 
> -static inline void *allocate_probes(int count)
> +/* Called in removal of a func but failed to allocate a new tp_funcs */
> +static void tp_stub_func(void)

I'm still not sure whether it's OK to call a (void) function with arguments.

> +{
> + return;
> +}
> +
> +static inline void *allocate_probes(int count, gfp_t extra_flags)
> {
>   struct tp_probes *p  = kmalloc(struct_size(p, probes, count),
> -GFP_KERNEL);
> +GFP_KERNEL | extra_flags);
>   return p == NULL ? NULL : p->probes;
> }
> 
> @@ -150,7 +156,7 @@ func_add(struct tracepoint_func **funcs, struct
> tracepoint_func *tp_func,
>   }
>   }
>   /* + 2 : one for new probe, one for NULL func */
> - new = allocate_probes(nr_probes + 2);
> + new = allocate_probes(nr_probes + 2, 0);
>   if (new == NULL)
>   return ERR_PTR(-ENOMEM);
>   if (old) {
> @@ -188,8 +194,9 @@ static void *func_remove(struct tracepoint_func **funcs,
>   /* (N -> M), (N > 1, M >= 0) probes */
>   if (tp_func->func) {
>   for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
> - if (old[nr_probes].func == tp_func->func &&
> -  old[nr_probes].data == tp_func->data)
> + if ((old[nr_probes].func == tp_func->func &&
> +  old[nr_probes].data == tp_func->data) ||
> + old[nr_probes].func == tp_stub_func)
>   nr_del++;
>   }
>   }
> @@ -207,15 +214,20 @@ static void *func_remove(struct tracepoint_func **funcs,
>   int j = 0;
>   /* N -> M, (N > 1, M > 0) */
>   /* + 1 for NULL */
> - new = allocate_probes(nr_probes - nr_del + 1);
> - if (new == NULL)
> - return ERR_PTR(-ENOMEM);
> - for (i = 0; old[i].func; i++)
> - if (old[i].func != tp_func->func
> - || old[i].data != tp_func->data)
> - new[j++] = old[i];
> - new[nr_probes - nr_del].func = NULL;
> - *funcs = new;
> + new = allocate_probes(nr_probes - nr_del + 1, __GFP_NOFAIL);
> + if (new) {
> + for (i = 0; old[i].func; i++)
> + if (old[i].func != tp_func->func
> + || old[i].data != tp_func->data)

as you point out in your reply, skip tp_stub_func here too.

> + new[j++] = old[i];
> + new[nr_probes - nr_del].func = NULL;
> + } else {
> + for (i = 0; old[i].func; i++)
> + if (old[i].func == tp_func->func &&
> + old[i].data == tp_func->data)
> + old[i].func = tp_stub_func;

I think you'll want a WRITE_ONCE(old[i].func, tp_stub_func) here, matched
with a READ_ONCE() in __DO_TRACE. This introduces a new situation where the
func pointer can be updated and loaded concurrently.

> + }
> + *funcs = old;

The line above seems wrong for the successful allocate_probe case. You will 
likely
want *funcs = new on successful allocation, and *funcs = old for the failure 
case.

Thanks,

Mathieu

>   }
>   debug_print_probes(*funcs);
>   return old;
> @@ -300,6 +312,10 @@ static int tracepoint_remove_func(struct tracepoint *tp,
>   return PTR_ERR(old);
>   }
> 
> + if (tp_funcs == old)
> + /* Failed allocating new tp_funcs, replaced func with stub */
> + return 0;
> +
>   if (!tp_funcs) {
>   /* Removed last function */
>   if (tp->unregfunc && static_key_enabled(&tp->key))

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-16 Thread Mathieu Desnoyers

- On Nov 16, 2020, at 3:44 PM, rostedt rost...@goodmis.org wrote:

> On Mon, 16 Nov 2020 15:37:27 -0500 (EST)
> Mathieu Desnoyers  wrote:
>> > 
>> > Mathieu,
>> > 
>> > Can't we do something that would still allow to unregister a probe even if
>> > a new probe array fails to allocate? We could kick off a irq work to try to
>> > clean up the probe at a later time, but still, the unregister itself should
>> > not fail due to memory failure.
>> 
>> Currently, the fast path iteration looks like:
>> 
>> struct tracepoint_func *it_func_ptr;
>> void *it_func;
>> 
>> it_func_ptr =   \
>> rcu_dereference_raw((&__tracepoint_##_name)->funcs); 
>> \
>> do {\
>> it_func = (it_func_ptr)->func;  \
>> __data = (it_func_ptr)->data;   \
>> ((void(*)(void *, proto))(it_func))(__data, args); \
>> } while ((++it_func_ptr)->func);
>> 
>> So we RCU dereference the array, and iterate on the array until we find a 
>> NULL
>> func. So you could not use NULL to skip items, but you could perhaps reserve
>> a (void *)0x1UL tombstone for this.
> 
> Actually, you could just set it to a stub callback that does nothing. then
> you don't even need to touch the above macro. Not sure why I didn't
> recommend this to begin with, because that's exactly what the function
> tracer does with ftrace_stub.

I like the stub idea.

What prototype should the stub function have ? Is it acceptable to pass
arguments to a function expecting (void) ? If not, then we may need to
create stub functions for each tracepoint.

> 
> 
>> 
>> It should ideally be an unlikely branch, and it would be good to benchmark 
>> the
>> change when multiple tracing probes are attached to figure out whether the
>> overhead is significant when tracing is enabled.
> 
> If you use a stub function, it shouldn't affect anything. And the worse
> that would happen is that you have a slight overhead of calling the stub
> until you can properly remove the callback.
> 
>> 
>> I wonder whether we really mind that much about using slightly more memory
>> than required after a failed reallocation due to ENOMEM. Perhaps the irq work
>> is not even needed. Chances are that the irq work would fail again and again 
>> if
>> it's in low memory conditions. So maybe it's better to just keep the 
>> tombstone
>> in place until the next successful callback array reallocation.
>> 
> 
> True. If we just replace the function with a stub on memory failure (always
> using __GFP_NOFAIL, and if it fails to reallocate a new array, just replace
> the callback with the stub and be done with it. It may require some more
> accounting to make sure the tracepoint.c code can handle these stubs, and
> remove them on new additions to the code.

Yes.

> Heck, if a stub exists, you could just swap it with a new item.

Not without proper synchronization, otherwise you could end up with
mismatch between function callback and private data. The only transition
valid without waiting for an rcu grace period is to change the function
pointer to a stub function. Anything else (e.g. replacing the stub by
some other callback function) would require to wait for a grace period
beforehand.

> But on any new changes to the list, the stubs should be purged.

Yes,

Thanks,

Mathieu

> 
> 
> -- Steve

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-16 Thread Steven Rostedt

On Mon, 16 Nov 2020 16:02:18 -0500
Steven Rostedt  wrote:

> + if (new) {
> + for (i = 0; old[i].func; i++)
> + if (old[i].func != tp_func->func
> + || old[i].data != tp_func->data)
> + new[j++] = old[i];

Oops, need to check for old[i].func != tp_stub_func here too.

-- Steve

> + new[nr_probes - nr_del].func = NULL;
> + } else {

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-16 Thread Steven Rostedt

On Mon, 16 Nov 2020 15:44:37 -0500
Steven Rostedt  wrote:

> If you use a stub function, it shouldn't affect anything. And the worse
> that would happen is that you have a slight overhead of calling the stub
> until you can properly remove the callback.

Something like this:

(haven't compiled it yet, I'm about to though).

-- Steve

diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c
index 3f659f855074..8eab40f9d388 100644
--- a/kernel/tracepoint.c
+++ b/kernel/tracepoint.c
@@ -53,10 +53,16 @@ struct tp_probes {
struct tracepoint_func probes[];
 };
 
-static inline void *allocate_probes(int count)
+/* Called in removal of a func but failed to allocate a new tp_funcs */
+static void tp_stub_func(void)
+{
+   return;
+}
+
+static inline void *allocate_probes(int count, gfp_t extra_flags)
 {
struct tp_probes *p  = kmalloc(struct_size(p, probes, count),
-  GFP_KERNEL);
+  GFP_KERNEL | extra_flags);
return p == NULL ? NULL : p->probes;
 }
 
@@ -150,7 +156,7 @@ func_add(struct tracepoint_func **funcs, struct 
tracepoint_func *tp_func,
}
}
/* + 2 : one for new probe, one for NULL func */
-   new = allocate_probes(nr_probes + 2);
+   new = allocate_probes(nr_probes + 2, 0);
if (new == NULL)
return ERR_PTR(-ENOMEM);
if (old) {
@@ -188,8 +194,9 @@ static void *func_remove(struct tracepoint_func **funcs,
/* (N -> M), (N > 1, M >= 0) probes */
if (tp_func->func) {
for (nr_probes = 0; old[nr_probes].func; nr_probes++) {
-   if (old[nr_probes].func == tp_func->func &&
-old[nr_probes].data == tp_func->data)
+   if ((old[nr_probes].func == tp_func->func &&
+old[nr_probes].data == tp_func->data) ||
+   old[nr_probes].func == tp_stub_func)
nr_del++;
}
}
@@ -207,15 +214,20 @@ static void *func_remove(struct tracepoint_func **funcs,
int j = 0;
/* N -> M, (N > 1, M > 0) */
/* + 1 for NULL */
-   new = allocate_probes(nr_probes - nr_del + 1);
-   if (new == NULL)
-   return ERR_PTR(-ENOMEM);
-   for (i = 0; old[i].func; i++)
-   if (old[i].func != tp_func->func
-   || old[i].data != tp_func->data)
-   new[j++] = old[i];
-   new[nr_probes - nr_del].func = NULL;
-   *funcs = new;
+   new = allocate_probes(nr_probes - nr_del + 1, __GFP_NOFAIL);
+   if (new) {
+   for (i = 0; old[i].func; i++)
+   if (old[i].func != tp_func->func
+   || old[i].data != tp_func->data)
+   new[j++] = old[i];
+   new[nr_probes - nr_del].func = NULL;
+   } else {
+   for (i = 0; old[i].func; i++)
+   if (old[i].func == tp_func->func &&
+   old[i].data == tp_func->data)
+   old[i].func = tp_stub_func;
+   }
+   *funcs = old;
}
debug_print_probes(*funcs);
return old;
@@ -300,6 +312,10 @@ static int tracepoint_remove_func(struct tracepoint *tp,
return PTR_ERR(old);
}
 
+   if (tp_funcs == old)
+   /* Failed allocating new tp_funcs, replaced func with stub */
+   return 0;
+
if (!tp_funcs) {
/* Removed last function */
if (tp->unregfunc && static_key_enabled(&tp->key))

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-16 Thread Steven Rostedt

On Mon, 16 Nov 2020 15:37:27 -0500 (EST)
Mathieu Desnoyers  wrote:
> > 
> > Mathieu,
> > 
> > Can't we do something that would still allow to unregister a probe even if
> > a new probe array fails to allocate? We could kick off a irq work to try to
> > clean up the probe at a later time, but still, the unregister itself should
> > not fail due to memory failure.  
> 
> Currently, the fast path iteration looks like:
> 
> struct tracepoint_func *it_func_ptr;
> void *it_func;
> 
> it_func_ptr =   \
> rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
> do {\
> it_func = (it_func_ptr)->func;  \
> __data = (it_func_ptr)->data;   \
> ((void(*)(void *, proto))(it_func))(__data, args); \
> } while ((++it_func_ptr)->func); 
> 
> So we RCU dereference the array, and iterate on the array until we find a NULL
> func. So you could not use NULL to skip items, but you could perhaps reserve
> a (void *)0x1UL tombstone for this.

Actually, you could just set it to a stub callback that does nothing. then
you don't even need to touch the above macro. Not sure why I didn't
recommend this to begin with, because that's exactly what the function
tracer does with ftrace_stub.

> 
> It should ideally be an unlikely branch, and it would be good to benchmark the
> change when multiple tracing probes are attached to figure out whether the
> overhead is significant when tracing is enabled.

If you use a stub function, it shouldn't affect anything. And the worse
that would happen is that you have a slight overhead of calling the stub
until you can properly remove the callback.

> 
> I wonder whether we really mind that much about using slightly more memory
> than required after a failed reallocation due to ENOMEM. Perhaps the irq work
> is not even needed. Chances are that the irq work would fail again and again 
> if
> it's in low memory conditions. So maybe it's better to just keep the tombstone
> in place until the next successful callback array reallocation.
> 

True. If we just replace the function with a stub on memory failure (always
using __GFP_NOFAIL, and if it fails to reallocate a new array, just replace
the callback with the stub and be done with it. It may require some more
accounting to make sure the tracepoint.c code can handle these stubs, and
remove them on new additions to the code. Heck, if a stub exists, you could
just swap it with a new item. But on any new changes to the list, the stubs
should be purged.

-- Steve

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-16 Thread Mathieu Desnoyers

- On Nov 16, 2020, at 12:19 PM, rostedt rost...@goodmis.org wrote:

> On Sat, 14 Nov 2020 21:52:55 -0800
> Matt Mullins  wrote:
> 
>> bpf_link_free is always called in process context, including from a
>> workqueue and from __fput.  Neither of these have the ability to
>> propagate an -ENOMEM to the caller.
>> 
> 
> Hmm, I think the real fix is to not have unregistering a tracepoint probe
> fail because of allocation. We are removing a probe, perhaps we could just
> inject NULL pointer that gets checked via the DO_TRACE loop?
> 
> I bet failing an unregister because of an allocation failure causes
> problems elsewhere than just BPF.
> 
> Mathieu,
> 
> Can't we do something that would still allow to unregister a probe even if
> a new probe array fails to allocate? We could kick off a irq work to try to
> clean up the probe at a later time, but still, the unregister itself should
> not fail due to memory failure.

Currently, the fast path iteration looks like:

struct tracepoint_func *it_func_ptr;
void *it_func;

it_func_ptr =   \
rcu_dereference_raw((&__tracepoint_##_name)->funcs); \
do {\
it_func = (it_func_ptr)->func;  \
__data = (it_func_ptr)->data;   \
((void(*)(void *, proto))(it_func))(__data, args); \
} while ((++it_func_ptr)->func); 

So we RCU dereference the array, and iterate on the array until we find a NULL
func. So you could not use NULL to skip items, but you could perhaps reserve
a (void *)0x1UL tombstone for this.

It should ideally be an unlikely branch, and it would be good to benchmark the
change when multiple tracing probes are attached to figure out whether the
overhead is significant when tracing is enabled.

I wonder whether we really mind that much about using slightly more memory
than required after a failed reallocation due to ENOMEM. Perhaps the irq work
is not even needed. Chances are that the irq work would fail again and again if
it's in low memory conditions. So maybe it's better to just keep the tombstone
in place until the next successful callback array reallocation.

Thoughts ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

2020-11-16 Thread Steven Rostedt

On Sat, 14 Nov 2020 21:52:55 -0800
Matt Mullins  wrote:

> bpf_link_free is always called in process context, including from a
> workqueue and from __fput.  Neither of these have the ability to
> propagate an -ENOMEM to the caller.
> 

Hmm, I think the real fix is to not have unregistering a tracepoint probe
fail because of allocation. We are removing a probe, perhaps we could just
inject NULL pointer that gets checked via the DO_TRACE loop?

I bet failing an unregister because of an allocation failure causes
problems elsewhere than just BPF.

Mathieu,

Can't we do something that would still allow to unregister a probe even if
a new probe array fails to allocate? We could kick off a irq work to try to
clean up the probe at a later time, but still, the unregister itself should
not fail due to memory failure.

-- Steve



> Reported-by: syzbot+83aa762ef23b6f0d1...@syzkaller.appspotmail.com
> Reported-by: syzbot+d29e58bb557324e55...@syzkaller.appspotmail.com
> Signed-off-by: Matt Mullins 
> ---
> I previously referenced a "pretty ugly" patch.  This is not that one,
> because I don't think there's any way I can make the caller of
> ->release() actually handle errors like ENOMEM.  
> 
> It also looks like most of the other ways tracepoint_probe_unregister is
> called also don't check the error code (e.g. just a quick grep found
> blk_unregister_tracepoints).  Should this just be upgraded to GFP_NOFAIL
> across the board instead of passing around a gfp_t?
> 
>  include/linux/trace_events.h |  6 --
>  include/linux/tracepoint.h   |  7 +--
>  kernel/bpf/syscall.c |  2 +-
>  kernel/trace/bpf_trace.c |  6 --
>  kernel/trace/trace_events.c  |  6 --
>  kernel/tracepoint.c  | 20 ++--
>  6 files changed, 28 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
> index 5c6943354049..166ad7646a98 100644
> --- a/include/linux/trace_events.h
> +++ b/include/linux/trace_events.h
> @@ -625,7 +625,8 @@ int perf_event_attach_bpf_prog(struct perf_event *event, 
> struct bpf_prog *prog);
>  void perf_event_detach_bpf_prog(struct perf_event *event);
>  int perf_event_query_prog_array(struct perf_event *event, void __user *info);
>  int bpf_probe_register(struct bpf_raw_event_map *btp, struct bpf_prog *prog);
> -int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog 
> *prog);
> +int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct bpf_prog 
> *prog,
> +  gfp_t flags);
>  struct bpf_raw_event_map *bpf_get_raw_tracepoint(const char *name);
>  void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp);
>  int bpf_get_perf_event_info(const struct perf_event *event, u32 *prog_id,
> @@ -654,7 +655,8 @@ static inline int bpf_probe_register(struct 
> bpf_raw_event_map *btp, struct bpf_p
>  {
>   return -EOPNOTSUPP;
>  }
> -static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp, struct 
> bpf_prog *p)
> +static inline int bpf_probe_unregister(struct bpf_raw_event_map *btp,
> +struct bpf_prog *p, gfp_t flags)
>  {
>   return -EOPNOTSUPP;
>  }
> diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
> index 598fec9f9dbf..7b02f92f3b8f 100644
> --- a/include/linux/tracepoint.h
> +++ b/include/linux/tracepoint.h
> @@ -12,6 +12,7 @@
>   * Heavily inspired from the Linux Kernel Markers.
>   */
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -40,7 +41,8 @@ extern int
>  tracepoint_probe_register_prio(struct tracepoint *tp, void *probe, void 
> *data,
>  int prio);
>  extern int
> -tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data);
> +tracepoint_probe_unregister(struct tracepoint *tp, void *probe, void *data,
> + gfp_t flags);
>  extern void
>  for_each_kernel_tracepoint(void (*fct)(struct tracepoint *tp, void *priv),
>   void *priv);
> @@ -260,7 +262,8 @@ static inline struct tracepoint 
> *tracepoint_ptr_deref(tracepoint_ptr_t *p)
>   unregister_trace_##name(void (*probe)(data_proto), void *data)  \
>   {   \
>   return tracepoint_probe_unregister(&__tracepoint_##name,\
> - (void *)probe, data);   \
> + (void *)probe, data,\
> + GFP_KERNEL);\
>   }   \
>   static inline void  \
>   check_trace_callback_type_##name(void (*cb)(data_proto))\
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index b999e7ff2583..f6876681c4ab 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2601,7 +2601,7 @@ static void bpf_raw_tp_link_release(struct bpf_link 
> *link)

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

Re: [PATCH] bpf: don't fail kmalloc while releasing raw_tp

12 matches

Site Navigation

Mail list logo

Footer information