Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"

2021-03-08 Thread Steven Rostedt
On Mon, 8 Mar 2021 16:57:29 +0200
"Yordan Karadzhov (VMware)"  wrote:

> > If we had:
> > 
> > <...>-37  [004] ...1  2022.303820: gc_worker 
> > <-process_one_work
> > <...>-37  [004] ...1  2022.303820: ___might_sleep 
> > <-gc_worker
> > <...>-37  [004] ...1  2022.303831: ___might_sleep 
> > <-gc_worker (last ts: 2022.303828 repeats: 127)
> > <...>-37  [004] ...1  2022.303831: queue_delayed_work_on 
> > <-process_one_work
> > 
> > We would know the last time __might_sleep was called.
> > 
> > That is, not only should we save the ip and pip in the trace_func_repeats
> > structure, but we should also be storing the last time stamp of the last
> > function event that repeated. Otherwise the above looks like the last
> > __might_sleep called above happened when the queue_delayed_work_on
> > happened, where that may not be the case.  
> 
> If we store the last timestamp, this means we will need to use 
> additional 64b on the buffer, every time we record the "func_repeats" 
> event. This looks like an overkill to me.
> Can we store only the duration of the repeats (the difference between 
> the timestamp)? This way we can use less memory at the price of having 
> one extra arithmetic operation.
> Alternative approach can be to store only the least-significant bits of 
> the timestamp.
> 
> What do you think?

I like the way you're thinking ;-)

Let's take a look at the current event sizes.

The function event is defined as:

name: function
ID: 1
format:
field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
field:unsigned char common_preempt_count;   offset:3;   size:1; 
signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:unsigned long ip; offset:8;   size:8; signed:0;
field:unsigned long parent_ip;  offset:16;  size:8; signed:0;

Showing that it's total size is 24 bytes (offset:16 + size:8)

and your current repeat event has:

ID: 17
format:
field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
field:unsigned char common_preempt_count;   offset:3;   size:1; 
signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:unsigned long ip; offset:8;   size:8; signed:0;
field:unsigned long pip;offset:16;  size:8; signed:0;
field:unsigned long count;  offset:24;  size:8; signed:0;

Which is 32 bytes. Adding another 8 would make it 40. It's bigger than one
function event, but still smaller than two, and adding just 4 bytes is
still some size more than a single function event.

Since this event is new, we could even do another trick (one that the
preemptirq events do:

ID: 434
format:
field:unsigned short common_type;   offset:0;   size:2; 
signed:0;
field:unsigned char common_flags;   offset:2;   size:1; 
signed:0;
field:unsigned char common_preempt_count;   offset:3;   size:1; 
signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;

field:s32 caller_offs;  offset:8;   size:4; signed:1;
field:s32 parent_offs;  offset:12;  size:4; signed:1;

print fmt: "caller=%pS parent=%pS", (void *)((unsigned long)(_stext) + 
REC->caller_offs), (void *)((unsigned long)(_stext) + REC->parent_offs)

The (_stext) is defined in /proc/kallsyms:

# grep stext /proc/kallsyms 
8d00 T _stext

And the offsets are off of that, which we could do the same thing here.

field:s32 ip;   offset:8;   size:4; signed:1;
field:s32 pip;  offset:12;  size:4; signed:1;
field:s16 count;offset:20;  size:2; signed:1;
field:u16 top_ts;   offset:22;  size:2; signed:0;
field:u32 bottom_ts;offset:24;  size:4; signed:0;

By putting count down to 2 bytes. Having more than 64K repeats is
probably not going to happen, and if it does, we could inject this more
than once ;-)

And as all events must be 4 byte aligned, we could use 6 bytes for the
offset, giving us: 2^40 bits instead of just 2^32 bits, that is, making the
time go from 4 seconds (2^32 nanoseconds) to 18 minutes (2^40 nanoseconds).

I would grab the time stamp after saving the event to make sure that it
doesn't go backwards. It may be a little racy (if an interrupt comes in
between), but it's still a "best effort" approach.

-- Steve



Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"

2021-03-08 Thread Yordan Karadzhov (VMware)



On 4.03.21 г. 18:38, Steven Rostedt wrote:

On Thu,  4 Mar 2021 11:01:37 +0200
"Yordan Karadzhov (VMware)"  wrote:

Thanks Yordan for doing this!

I have some comments below.



Hi Steven,

Thank you very much for looking into this!

Your suggestion makes perfect sense. I only have one clarifying question 
below.



diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 4547ac59da61..8007f9b6417f 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -338,3 +338,19 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
 __entry->nmi_total_ts,
 __entry->nmi_count)
  );
+
+FTRACE_ENTRY(func_repeats, func_repeats_entry,
+
+   TRACE_FUNC_REPEATS,
+
+   F_STRUCT(
+   __field(unsigned long,  ip  )
+   __field(unsigned long,  pip )
+   __field(unsigned long,  count   )
+   ),
+
+   F_printk(" %ps <-%ps\t(repeats:%lu)",
+(void *)__entry->ip,
+(void *)__entry->pip,
+__entry->count)


After playing with this a little, I realized that we should also store the
last timestamp as well. I think that would be interesting information.

<...>-37  [004] ...1  2022.303820: gc_worker <-process_one_work
<...>-37  [004] ...1  2022.303820: ___might_sleep <-gc_worker
<...>-37  [004] ...1  2022.303831: ___might_sleep <-gc_worker 
(repeats: 127)
<...>-37  [004] ...1  2022.303831: queue_delayed_work_on 
<-process_one_work

The above shows that __might_sleep() was called 128 times, but what I don't
get from the above, is when that last call was made. You'll see that the
timestamp for the repeat output is the same as the next function shown
(queue_delayed_work_on()). But the timestamp for the last call to
__might_sleep() is lost, and the repeat event ends up being written when
it is detected that there are no more repeats.

If we had:

<...>-37  [004] ...1  2022.303820: gc_worker <-process_one_work
<...>-37  [004] ...1  2022.303820: ___might_sleep <-gc_worker
<...>-37  [004] ...1  2022.303831: ___might_sleep <-gc_worker 
(last ts: 2022.303828 repeats: 127)
<...>-37  [004] ...1  2022.303831: queue_delayed_work_on 
<-process_one_work

We would know the last time __might_sleep was called.

That is, not only should we save the ip and pip in the trace_func_repeats
structure, but we should also be storing the last time stamp of the last
function event that repeated. Otherwise the above looks like the last
__might_sleep called above happened when the queue_delayed_work_on
happened, where that may not be the case.


If we store the last timestamp, this means we will need to use 
additional 64b on the buffer, every time we record the "func_repeats" 
event. This looks like an overkill to me.
Can we store only the duration of the repeats (the difference between 
the timestamp)? This way we can use less memory at the price of having 
one extra arithmetic operation.
Alternative approach can be to store only the least-significant bits of 
the timestamp.


What do you think?

Best regards,
Yordan



-- Steve




Re: [RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"

2021-03-04 Thread Steven Rostedt
On Thu,  4 Mar 2021 11:01:37 +0200
"Yordan Karadzhov (VMware)"  wrote:

Thanks Yordan for doing this!

I have some comments below.

> diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
> index 4547ac59da61..8007f9b6417f 100644
> --- a/kernel/trace/trace_entries.h
> +++ b/kernel/trace/trace_entries.h
> @@ -338,3 +338,19 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
>__entry->nmi_total_ts,
>__entry->nmi_count)
>  );
> +
> +FTRACE_ENTRY(func_repeats, func_repeats_entry,
> +
> + TRACE_FUNC_REPEATS,
> +
> + F_STRUCT(
> + __field(unsigned long,  ip  )
> + __field(unsigned long,  pip )
> + __field(unsigned long,  count   )
> + ),
> +
> + F_printk(" %ps <-%ps\t(repeats:%lu)",
> +  (void *)__entry->ip,
> +  (void *)__entry->pip,
> +  __entry->count)

After playing with this a little, I realized that we should also store the
last timestamp as well. I think that would be interesting information.

   <...>-37  [004] ...1  2022.303820: gc_worker <-process_one_work
   <...>-37  [004] ...1  2022.303820: ___might_sleep <-gc_worker
   <...>-37  [004] ...1  2022.303831: ___might_sleep <-gc_worker 
(repeats: 127)
   <...>-37  [004] ...1  2022.303831: queue_delayed_work_on 
<-process_one_work

The above shows that __might_sleep() was called 128 times, but what I don't
get from the above, is when that last call was made. You'll see that the
timestamp for the repeat output is the same as the next function shown
(queue_delayed_work_on()). But the timestamp for the last call to
__might_sleep() is lost, and the repeat event ends up being written when
it is detected that there are no more repeats.

If we had:

   <...>-37  [004] ...1  2022.303820: gc_worker <-process_one_work
   <...>-37  [004] ...1  2022.303820: ___might_sleep <-gc_worker
   <...>-37  [004] ...1  2022.303831: ___might_sleep <-gc_worker 
(last ts: 2022.303828 repeats: 127)
   <...>-37  [004] ...1  2022.303831: queue_delayed_work_on 
<-process_one_work

We would know the last time __might_sleep was called.

That is, not only should we save the ip and pip in the trace_func_repeats
structure, but we should also be storing the last time stamp of the last
function event that repeated. Otherwise the above looks like the last
__might_sleep called above happened when the queue_delayed_work_on
happened, where that may not be the case.

-- Steve




[RFC PATCH 1/5] tracing: Define new ftrace event "func_repeats"

2021-03-04 Thread Yordan Karadzhov (VMware)
The event aims to consolidate the function tracing record in the cases
when a single function is called number of times consecutively.

while (cond)
do_func();

This may happen in various scenarios (busy waiting for example).
The new ftrace event can be used to show repeated function events with
a single event and save space on the ring buffer

Signed-off-by: Yordan Karadzhov (VMware) 
---
 kernel/trace/trace.h |  3 +++
 kernel/trace/trace_entries.h | 16 +
 kernel/trace/trace_output.c  | 44 
 3 files changed, 63 insertions(+)

diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index a6446c03cfbc..2be4a56879de 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -45,6 +45,7 @@ enum trace_type {
TRACE_BPUTS,
TRACE_HWLAT,
TRACE_RAW_DATA,
+   TRACE_FUNC_REPEATS,
 
__TRACE_LAST_TYPE,
 };
@@ -441,6 +442,8 @@ extern void __ftrace_bad_type(void);
  TRACE_GRAPH_ENT); \
IF_ASSIGN(var, ent, struct ftrace_graph_ret_entry,  \
  TRACE_GRAPH_RET); \
+   IF_ASSIGN(var, ent, struct func_repeats_entry,  \
+ TRACE_FUNC_REPEATS);  \
__ftrace_bad_type();\
} while (0)
 
diff --git a/kernel/trace/trace_entries.h b/kernel/trace/trace_entries.h
index 4547ac59da61..8007f9b6417f 100644
--- a/kernel/trace/trace_entries.h
+++ b/kernel/trace/trace_entries.h
@@ -338,3 +338,19 @@ FTRACE_ENTRY(hwlat, hwlat_entry,
 __entry->nmi_total_ts,
 __entry->nmi_count)
 );
+
+FTRACE_ENTRY(func_repeats, func_repeats_entry,
+
+   TRACE_FUNC_REPEATS,
+
+   F_STRUCT(
+   __field(unsigned long,  ip  )
+   __field(unsigned long,  pip )
+   __field(unsigned long,  count   )
+   ),
+
+   F_printk(" %ps <-%ps\t(repeats:%lu)",
+(void *)__entry->ip,
+(void *)__entry->pip,
+__entry->count)
+);
diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c
index 61255bad7e01..af6b066972e9 100644
--- a/kernel/trace/trace_output.c
+++ b/kernel/trace/trace_output.c
@@ -1373,6 +1373,49 @@ static struct trace_event trace_raw_data_event = {
.funcs  = _raw_data_funcs,
 };
 
+static enum print_line_t
+trace_func_repeats_raw(struct trace_iterator *iter, int flags,
+struct trace_event *event)
+{
+   struct func_repeats_entry *field;
+   struct trace_seq *s = >seq;
+
+   trace_assign_type(field, iter->ent);
+
+   trace_seq_printf(s, "%lu %lu %li\n",
+field->pip,
+field->ip,
+field->count);
+
+   return trace_handle_return(s);
+}
+
+static enum print_line_t
+trace_func_repeats_print(struct trace_iterator *iter, int flags,
+struct trace_event *event)
+{
+   struct func_repeats_entry *field;
+   struct trace_seq *s = >seq;
+
+   trace_assign_type(field, iter->ent);
+
+   seq_print_ip_sym(s, field->ip, flags);
+   trace_seq_puts(s, " <-");
+   seq_print_ip_sym(s, field->pip, flags);
+   trace_seq_printf(s, " (repeats: %li)\n", field->count);
+
+   return trace_handle_return(s);
+}
+
+static struct trace_event_functions trace_func_repeats_funcs = {
+   .trace  = trace_func_repeats_print,
+   .raw= trace_func_repeats_raw,
+};
+
+static struct trace_event trace_func_repeats_event = {
+   .type   = TRACE_FUNC_REPEATS,
+   .funcs  = _func_repeats_funcs,
+};
 
 static struct trace_event *events[] __initdata = {
_fn_event,
@@ -1385,6 +1428,7 @@ static struct trace_event *events[] __initdata = {
_print_event,
_hwlat_event,
_raw_data_event,
+   _func_repeats_event,
NULL
 };
 
-- 
2.25.1