On Mon, 2 Nov 2020 17:41:47 +0100
Petr Mladek <pmla...@suse.com> wrote:

> On Fri 2020-10-30 17:31:53, Steven Rostedt wrote:
> > From: "Steven Rostedt (VMware)" <rost...@goodmis.org>
> > 
> > This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file
> > "recursed_functions" all the functions that caused recursion while a
> > callback to the function tracer was running.
> >   
> 
> > --- /dev/null
> > +++ b/kernel/trace/trace_recursion_record.c
> > @@ -0,0 +1,220 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +#include <linux/seq_file.h>
> > +#include <linux/kallsyms.h>
> > +#include <linux/module.h>
> > +#include <linux/ftrace.h>
> > +#include <linux/fs.h>
> > +
> > +#include "trace_output.h"
> > +
> > +struct recursed_functions {
> > +   unsigned long           ip;
> > +   unsigned long           parent_ip;
> > +};
> > +
> > +static struct recursed_functions 
> > recursed_functions[CONFIG_FTRACE_RECORD_RECURSION_SIZE];  
> 
> The code tries to be lockless safe as much as possible. It would make
> sense to allign the array.

Hmm, is there an arch where the compiler would put an array of structures
with two unsigned long, misaligned?

> 
> 
> > +static atomic_t nr_records;
> > +
> > +/*
> > + * Cache the last found function. Yes, updates to this is racey, but
> > + * so is memory cache ;-)
> > + */
> > +static unsigned long cached_function;
> > +
> > +void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
> > +{
> > +   int index;
> > +   int i = 0;
> > +   unsigned long old;
> > +
> > + again:
> > +   /* First check the last one recorded */
> > +   if (ip == cached_function)
> > +           return;
> > +
> > +   index = atomic_read(&nr_records);
> > +   /* nr_records is -1 when clearing records */
> > +   smp_mb__after_atomic();
> > +   if (index < 0)
> > +           return;
> > +
> > +   /* See below */
> > +   if (i > index)
> > +           index = i;  
> 
> This looks like a complicated way to do index++ via "i" variable.
> I guess that it was needed only in some older variant of the code.
> See below.

Because we reread the index above, and index could be bigger than i (more
than index + 1).

> 
> > +   if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
> > +           return;
> > +
> > +   for (i = index - 1; i >= 0; i--) {
> > +           if (recursed_functions[i].ip == ip) {
> > +                   cached_function = ip;
> > +                   return;
> > +           }
> > +   }
> > +
> > +   cached_function = ip;
> > +
> > +   /*
> > +    * We only want to add a function if it hasn't been added before.
> > +    * Add to the current location before incrementing the count.
> > +    * If it fails to add, then increment the index (save in i)
> > +    * and try again.
> > +    */
> > +   old = cmpxchg(&recursed_functions[index].ip, 0, ip);
> > +   if (old != 0) {
> > +           /* Did something else already added this for us? */
> > +           if (old == ip)
> > +                   return;
> > +           /* Try the next location (use i for the next index) */
> > +           i = index + 1;  
> 
> What about
> 
>               index++;
> 
> We basically want to run the code again with index + 1 limit.

But something else could update nr_records, and we want to use that if
nr_records is greater than i.

Now, we could swap the use case, and have

        int index = 0;

        [..]
        i = atomic_read(&nr_records);
        if (i > index)
                index = i;

        [..]

                index++;
                goto again;


> 
> Maybe, it even does not make sense to check the array again
> and we should just try to store the value into the next slot.

We do this dance to prevent duplicates.

But you are correct, that this went through a few iterations. And the first
ones didn't have the cmpxchg on the ip itself, and that could make it so
that we don't need this index = i dance.

> 
> > +           goto again;
> > +   }
> > +
> > +   recursed_functions[index].parent_ip = parent_ip;  
> 
> WRITE_ONCE() ?

Does it really matter?

> 
> > +
> > +   /*
> > +    * It's still possible that we could race with the clearing
> > +    *    CPU0                                    CPU1
> > +    *    ----                                    ----
> > +    *                                       ip = func
> > +    *  nr_records = -1;
> > +    *  recursed_functions[0] = 0;
> > +    *                                       i = -1
> > +    *                                       if (i < 0)
> > +    *  nr_records = 0;
> > +    *  (new recursion detected)
> > +    *      recursed_functions[0] = func
> > +    *                                            
> > cmpxchg(recursed_functions[0],
> > +    *                                                    func, 0)
> > +    *
> > +    * But the worse that could happen is that we get a zero in
> > +    * the recursed_functions array, and it's likely that "func" will
> > +    * be recorded again.
> > +    */
> > +   i = atomic_read(&nr_records);
> > +   smp_mb__after_atomic();
> > +   if (i < 0)
> > +           cmpxchg(&recursed_functions[index].ip, ip, 0);
> > +   else if (i <= index)
> > +           atomic_cmpxchg(&nr_records, i, index + 1);  
> 
> This looks weird. It would shift nr_records past the record added
> in this call. It might skip many slots that were zeroed when clearing.
> Also we do not know if our entry was not zeroed as well.
> 
> I would suggest to do it some other way (not even compile tested):
> 
> void ftrace_record_recursion(unsigned long ip, unsigned long parent_ip)
> {
>       int index, old_index;
>       int i = 0;
>       unsigned long old_ip;
> 
>  again:
>       /* First check the last one recorded. */
>       if (ip == READ_ONCE(cached_function))
>               return;
> 
>       index = atomic_read(&nr_records);
>       /* nr_records is -1 when clearing records. */
>       smp_mb__after_atomic();
>       if (index < 0)
>               return;
> 
>       /* Already cached? */
>       for (i = index - 1; i >= 0; i--) {
>               if (recursed_functions[i].ip == ip) {
>                       WRITE_ONCE(cached_function, ip);
>                       return;
>               }
>       }
> 
>       if (index >= CONFIG_FTRACE_RECORD_RECURSION_SIZE)
>               return;
> 
>       /*
>        * Try to reserve the slot. It might be already taken
>        * or the entire cache cleared.
>        */
>       old_index = atomic_cmpxchg(&nr_records, index, index + 1);
>       if (old_index != index)
>               goto again;
> 
>       /*
>        * Be careful. The entire cache might have been cleared and reused in
>        * the meantime. Replace only empty slot.
>        */
>       old_ip = cmpxchg(&recursed_functions[index].ip, 0, ip);
>       if (old_ip != 0)
>               goto again;
> 
>       old_ip = cmpxchg(&recursed_functions[index].parent_ip, 0, parrent_ip);
>       if (old_ip != 0)
>               goto again;
> 
>       /*
>        * No ip is better than non-consistent one. The race with
>        * clearing should be rare and not worth a perfect solution.
>        */
>       if (READ_ONCE(recursed_functions[index].ip) != ip) {
>               cmpxchg(&recursed_functions[index].ip, ip, 0UL)
>               goto again;
>       }
> }

Let me go and rewrite it, this time considering the cmpxchg in the ip
update code. I may end up with what you have above ;-)


> 
> The last check probably is not needed. Inconsistent entries
> should be prevented by the way how this func is called:
> 
>               static atomic_t paranoid_test;                          \
>               if (!atomic_read(&paranoid_test)) {                     \
>                       atomic_inc(&paranoid_test);                     \
>                       ftrace_record_recursion(ip, pip);               \
>                       atomic_dec(&paranoid_test);                     \
>               }                                                       \
> 
> 
> 
> 
> The rest of the patchset looks fine. I do not feel comfortable to give
> it Reviewed-by because I did not review it in depth.
> 
> I spent more time with the above lockless code. I took it is a
> training. I need to improve this skill to feel more comfortable with
> the lockless printk ring buffer ;-)

Yeah, everything becomes exponentially complex when you make it lockless
with multiple concurrent writers.

-- Steve

Reply via email to